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PREFACE 


Sampling Statistics and Applications is the second volume of 
Fundamentals of the Theory of Statistics; this second volume is 
intended for advanced students or research workers. The first 
volume is entitled Elementary Statistics and Applications and is 
designed for beginning courses in statistics. 

After reviewing the basic concepts and definitions in Sampling 
Statistics and Applications, the authors discuss the general theory 
of frequency curves and the theory of random sampling. Impor- 
tant sampling distributions are derived, and their applications to 
a variety of problems are illustrated. Exact methods applicable 
primarily to normal populations and approximate methods used 
in sampling from discrete populations and from continuous non- 
normal populations are considered. Theoretical discussion is 
illustrated throughout to show real-life applications. The 
character of assumptions involved in theory is explicitly treated; 
the problems that such assumptions present when the statistieian 
is confronted with practical applications are illustrated. 

The arrangement of the material in the book is based partly on 
grounds of logic and partly on practical considerations. The 
theory of frequency curves is general in scope and is therefore 
presented first. The theory of sampling, being an elaboration 
of a special part of the theory of frequency curves, is discussed 
after the more general theory. Within the discussion of the 
theory of sampling, the more elementary aspects are examined 
before approaching complex problems. This leads to a separa- 
tion of some material that might logically be run together. It 
gives the instructor greater freedom, however, in the selection of 
material appropriate for the level of his class, while the arrange- 
ment is such that he can assign material in logical sequence if he 
80 desires. 

For the most part theory and application have been discussed 
together in this volume. In certain cases, however, in which 
theoretical discussion is especially elaborate, its applieations to 
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practical problems are treated separately. Numerical calcula- 
tions for frequency curves, for example, are discussed in a separate 
chapter following the chapters devoted to the theory of frequency 
curves. Again, the uses of the sampling distributions of the 
mean, standard deviation, etc., are discussed in a chapter 
separate from that in which the theoretical derivation of these 
distributions is presented. Оп the whole, the arrangement of 
material is designed to facilitate the use of the book in the school 
or research laboratory as well as in the classroom. 

Progress in sampling theory and its application has been rapid 
during the past 25 years. Much has been accomplished in 
deriving exact sampling distributions for important statistics 
und in clarifying the assumptions on which statistical analysis 
rests. Discussion has also been active regarding the concept of 
probability likely to be most fruitful for Statistical research, and 
inereasing attention has been paid to the logie of statistical 
inference. "This development of theory has been accompanied 
by progress in method and application. The discovery of more 
exact sampling distributions, for example, has led to greater 
emphasis upon careful design of experiments and less emphasis 
upon size of sample. Quality of method has, in many instances, 
displaced reliance only on quantity of numbers. 

Sampling Statistics and Applications represents an attempt to 
coordinate the new theory and applications with the old, to 
place the whole upon a logically consistent basis, and to put the 
subdivisions in proper relation to each other. To this end a 
concept of probability that at present appears most fruitful for 
statistical research was adopted, and the theory of sampling is 
explained in those terms. Elaboration of special techniques 
regarding the design of experiment and analysis of variance is 
not included since the primary intent is to emphasize funda- 
mentals of the theory of statistics. Explanations both of old 
and of new methods start with the fundamentals; and, unless 
otherwise indicated, a particular exposition includes all the 
argument. If a highly mathematical step of an argument is 
omitted or abbreviated, this is so stated. Some of the mathe- 
matical portions are included in the text; other more advanced 
mathematical material is placed in appendixes to some of the 
chapters. These mathematical parts are presented in such a 
way as to be readily teachable. 


PREFACE vii 


The authors have drawn freely upon the many monographs 
and the periodical literature that have appeared during recent 
years. Care has been exercised to make acknowledgment in 
footnotes to the sources of new ideas that have been incorporated 
into the authors’ own development of the subject. То all these 
vigorous workers in the field, too numerous to be listed by name, 
the authors are greatly indebted. The authors especially 
acknowledge a debt of gratitude to John H. Smith of the Bureau 
of Labor Statisties, who contributed many stimulating criticisms 
and suggestions that inspired important improvements in this 
book. The authors are grateful to the International Finance 
Section of Princeton University for the financial assistance that 
was given Acheson J. Duncan some years ago to study statisties 
and mathematical economics with the late Henry Schultz of 
the University of Chicago and with Harold Hotelling of Columbia 
University. The authors are indebted also to those men for help 
and guidance in the study of statistics. 

Naturally it is not to be supposed that the whole or any part 
of the manuscript carries the endorsement of the authors’ former 
teachers or those who have helped with criticisms of the manu- 
script. The authors assume full responsibility for any errors of 
theory or calculations that may be present in the volume. 

They are indebted to Prof. R. A. Fisher, also to Oliver & Boyd, 
Ltd., of Edinburgh, for permission to reprint Tables III and IV 
from their book Statistical Methods for Research Workers. As 
indicated in notes making specific acknowledgment in the text 
and in the Appendix, the authors are also indebted to others for 
reproductions or abridgments of tables of several other sampling 
distributions. 

James G. SMITH, 
ACHESON J. DUNCAN. 
PRINCETON, N.J., " 
March, 1945. 
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INTRODUCTION 


CHAPTER I 
DEFINITIONS AND BASIC CONCEPTS 


Frequency Distributions. Time-honored observation has 

made commonplace the eighteenth-century discovery that static 
variability in almost any conceivable attribute of an object, 
event, or condition follows a surprisingly uniform pattern.! 
The uniform pattern of static variability is revealed by arranging 
the variable in a frequency distribution or frequency series, which 
is simply a summary of an array of the variable arranged from 
Smallest to largest. It is also called a “monovariate.” When 
graphed the figure is called a “histogram,” “frequency polygon,” 
or “frequency curve,” depending upon the manner of graphing. 
Two types of frequency series are to be carefully distinguished, 
discrete frequency series and continuous frequency series. 
۴ Discrete Frequency Series. A discrete frequency series is one 
їп which the variation, by the nature of the variable, is in distinct 
Steps. The variation in size of men's shoes occurs by distinct 
Steps, not by infinitesimal differences. Accordingly, a frequency 
Series describing the number of men's shoes, of various sizes is a 
discrete frequency series. The same statement could be made 
With respect to almost any item of clothing manufactured in 
Sizes for mass consumption. 

The graph of a discrete frequency series should be in the form 
of a histogram, not a frequency curve; for the latter would sug- 
&est continuous variability, which is contrary to fact in the case 
of a discrete series. 

Continuous Frequency Series. A continuous series is one 
representing a phenomenon that varies by infinitesimal amounts. 

! Static variability refers to variability which is not correlated with time 
Or in which time is “held constant.” Variability correlated with time, t.e., 
dynamic variability, may assume a great variety of patterns. See SMITH, 
J. G., and A. J. Duncan, Elementary Statistics and Applications, Chaps. V 


and XIX, 
T 
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For example, the frequency distribution of weights or heights of 
people of some specified age is continuous in character; for the 
differences among a large number of people in weight or in height 
are in fact by infinitesimal amounts. А continuous series may 
have the appearance in a frequency table of the same discreteness 
as a discrete series; but this is because the arbitrarily discrete 
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Ета. 1,—Maternal mortality in cities of 100,000 or more population in the United 
States in 1938. (Deaths per 1,000 live births.) 


character of the unit of measurement eclipses the actual con- 
tinuous character of the data. 

The graph of a continuous frequency series may appropriately 
be in the form of a frequency curve, but often when the number 
of cases observed is comparatively few a continuous frequency 
series is pictured by a histogram. 

Histograms and Frequency Curves. Figure 1 is an illustration 
of a histogram; it is a graph of the frequency distribution pre- 
sented in the first two columns of Table 1. In this figure the 
frequency of any class interval is represente 


erected on that interval as a base and with а hı 
observed frequency. 


d by a rectangle 
eight equal to the 
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For many types of analysis it is preferable to present the fre- 
quency distribution and the corresponding histogram in a form 
in which the area of a rectangle represents the proportional fre- 
quency of an interval. Figure 2 is an illustration of such a 
histogram; it is a graph of the third column of Table 1 , With the 
same scale variation as that used in Fig. 1, namely, that shown 
in column (1) of Table 1. 


TABLE l.—MarERNAL MORTALITY IN CITIES or 100,000 ок More 
POPULATION IN THE UNITED STATES IN 1938 


T 


(1) (2) (3) 
Deaths per 1,000 | Number of | прот 
live births cities 
в 
x F X 
1- 2 .022 
2- 16 .172 
3- 18 .193 
4- 20 .215 
5- 15 .161 
6- 10 .108 
7- 4 .043 
8- 6 .064 
9- 0 .000 
10- 2 .022 
75 = 93 | = -1.000 


In histograms of the type shown in Fig. 2, the total area of the 
histogram always equals 1, and each of the bars is a portion of 1. 

Now suppose that the data from which the histogram has been 
Constructed were a sample from a very large set of cases, theo- 
retically an infinite set. For example, the data might be the 
heights of 100 adult males of the white race, instead of the 
Mortality statistics above illustrated. The 100 heights, then, 
Would be a sample of the heights of all adult men of that race, 
Presumably millions of men. If the size of the sample were 
Increased, the class interval could be reduced without causing 
Irregularities in the form of the plotted histogram. In fact, if 
the number in the sample is made larger and larger and at the 
Same time the size of the class interval is continuously reduced, 
the histogram will tend to become more and more regular and 
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the tops of the rectangles, which are getting narrower and 
narrower, will come closer and closer to forming а smooth con- 
tinuous curve (a frequency curve). 
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Fic. 2.—Maternal mortality in cities of 100,000 or more population in the United 
States in 1938. (Deaths per 1,000 live births.) 


Fic. 3.—A frequency curve. 


In such а manner, the frequency curve may be viewed as 


the limit that an area histogram of proportional frequencies 
approaches as the number of cases is increased and the size of 
the class interval is reduced indefinitely. The frequency curve 
depicts the distribution of a theoretically infinite set of data, 
with a theoretically i Figure 3 illus- 


nfinitesimal class interval. 
trates a frequency curve. 
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Being the limit approached by an area histogram of propor- 
tional frequencies, the frequency curve has a total area between 
the curve and the X-axis that is always equal to 1. Furthermore, 
any section of area under the curve will give the relative fre- 
quency of the cases falling within the class interval marking off 
that section of area. 

Populations, Parameters, and Statistics. То say that the 
population of the United States is some hundred and thirty 
million people is a familiar use of the word “population.” In 
statisties the word is used in the same familiar sense, but it is also 
used in a more general sense. In statistics the term “ popula- 
tion" (or sometimes universe") refers to the enumeration of 
persons or animals of any kind or even to the enumeration of 
inanimate things. In statistics, population refers to all the 
objects of a defined kind in existence in some specified universe; 
for example, all the adult, males in the United States constitute 
а population, as do all the automobiles in the United States or 
all the three-year-old steers in the United States. 

The measurements of characteristics of a population are called 
"parameters." The average height of all adult males in the 
United States is a parameter of that population. The average 
weight of all the three-year-old steers in the United States is a 
parameter of that population. Rarely are the parameters of any 
of these populations actually measured. In practice, it is much 
easier and more practical to estimate the parameter by taking 
the average or measuring the corresponding characteristic of a 
sample from the population in question. This latter measure 
is called a “statistic.” Thus parameters are the characteristics 
of the population, and the corresponding sample statistics are the 
measures of the corresponding characteristics of samples. 

Averages. In addition to the familiar arithmetic mean, other 
averages are used as devices for summarizing or comparing. The 
median and mode are often used in statistical analysis; less fre- 
quently, but necessarily for certain purposes," the geometric 
mean and the harmonic mean are used. 

By definition the arithmetic mean is the sum of the variables 
divided by the number of variables, t.e., 

zo EX 
Ж enr (1) 


1 Cf. Surrg and DUNCAN, op. сй., рр. 173-179. 
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When the variable is arranged in a frequency distribution, this 
equation for the mean generally appears as 
=  EFX " 
= == І 
X = 5. (1) 
to represent the fact that the Several class- 


interval values of X 
have their respective frequencies, 50 that 


PX = Р.Х: + ВХ, e + Fx, 


It is important to note that, when the frequency distribution 
is expressed proportionally, as in the third column of Table 1 | 
it becomes a distribution of probability! 


The arithmetic mean 
of a distribution of probabil 


ity, i.e., of a proportional frequency 
distribution, is equal to 2; Е 
probability, this equation is 


X. Expressed in the symbols of 


X = =P(x)x (2) 


Table 2 illustrates the caleulation of the arithmetie mean of the 
frequency distribution Shown in Table 1. а 
TABLE 2.— CALCULATION OF THE ARITHMETIC-MEAN MATERNAL MORTALITY 


IN CITIES OF 100,000 ов Mons IN 1938 
Deaths per 1,000 live births 


== و‎ : | " — 
x Mid-point of i i "WA. сакан 
class interval 
الس‎ 

1- 1.8 2 3.0 .022 .033 

2- 2.5 16 40.0 .172 .430 

3- 3.5 18 63.0 .193 . 676 

4 4.5 20 90.0 .215 .968 

5- 5.5 15 82.5 161 .886 

6- 6.5 10 65.0 .108 .702 

7- 7.6 4 30.0 .043 .322 

8- 8.5 6 51.0 -064 544 

9- 9.6 0 . 000 эе әзе 
10- 10.5 2 21.0  .022 .231 - 

93 445.5 1.000 4.792 

1Бее ibid., р. 254. 
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Accordingly, 
ХРЕН =47 


or 
X = zP(X)X = 4.792 

The sum of the column of FX’s must be divided by М (= 93) 
in order to find the mean, whereas the sum of the column of 
P(X)X’s is the mean. It will be noted that taking the mid-point 
of each class interval as the X, respectively, of the various class 
intervals is equivalent to assuming that the frequencies within 
each class interval are so distributed that their average is equal 
to the mid-point. This assumption causes negligible error in 
the calculation of the mean. 


25% 25% 25% 25% 
of the ofthe ofthe of the 
cities cities cities cities 
0 3.3 4.5 5.9 AX 


Q, Mi Q3 
Fia. 4.—Median and quartiles of mortality rates in selected cities of the United 
States. 

The median is a position average; by definition, the median 
is that value than which there is an equal number of cases larger 
and smaller. When the cases are arranged in an array, the 
median is either the value of the middle one (when there is an 
odd number of cases) or some value between the two middle ones 
(when there is an even number of cases). Ordinarily in the latter 
instance the arithmetic mean of the two middle cases is taken 
as the median value. The symbol for the median is Mi. 

The first quartile, Qi, is the value below which one fourth of 
the cases fall and above which three fourths of the cases fall. 
The third quartile, Qs, is that value below which three fourths 
of the cases fall and above which one fourth of the cases fall. 
The median, obviously, is equivalent to the second quartile. 
If the cases are arranged in an array, it should be apparent from 
the definitions of the median and the quartiles that these three 
values divide the cases into four parts, with one fourth of the 
Cases in each part. This is illustrated graphically in Fig. 4. 

! When the standard deviation is so calculated, as it usually is, it is neces- 
sary to correct for error due to the assumption made; the correction is called 
“Sheppard’s correction.” СУ. pp. 10, 12. 
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For the frequency distribution shown in Tables 1 and 2, the 
median is found by interpolating the value of the middle (463) 
case, to be 4.53; Qı is found by interpolating the N/4 case to 
be 3.3; and Qs is found by interpolating the 3N/4 case to be 
5.9. As shown in Fig. 4, the significance of these three values is 
that, in 1938, 25 per cent of the cities of 100,000 or more popula- 
tion had a maternal death rate of less than 3.3 per 1,000 live 
births; 25 per cent of the cities had maternal death rates between 
3.3. and 4.5 per 1,000 live births; 25 per cent had maternal death 
rates between 4.5 and 5.9 per 1,000 live births; and the remaining 
25 per cent of the cities had maternal death rates greater than 
5.9 per 1,000 live births. 

If it were desired to obtain a more detailed description of the 
distribution of these cities with respect to maternal death rates, 
the distribution could be divided in 10 parts each containing 
10 per cent of the cities. The dividing points of these 10 parts 
would be called “deciles,” instead of quartiles; there would be 
9 deciles, and the fifth would be the same value as the median. 
Similarly, if with a larger number of cases it were desired to 
obtain a description of the distribution sufficiently refined to say 
within what limits lie each 1 per cent of the group of cities, the 
distribution could be divided in 100 parts each containing 1 per 
cent of the cities. The dividing points of these 100 parts would 
be called “percentiles” ; and there would be 99 percentiles, of 
which the fiftieth would be the same value as the median. 

Another commonly used average, the mode, is described in 
terms of relative frequency of occurrence. 16 is the magnitude 
that occurs more frequently than any other. The mode is the 
most probable value. When the data are presented in a fre- 
quency distribution, it is necessary to interpolate for the mode. 
The procedure may be illustrated by finding the mode in the 
frequency distribution shown in Tables 1 and 2. It may be 
assumed that the mode lies somewhere between 4 and 5, for 
more cases (20) lie in that class interval than in any other of the 
class intervals. The mode is equal to the lower limit of the modal 
class interval plus the interpolated part of the class interval 
established by the relationship of the frequencies above and 
below that class interval. Thirty-six frequencies lie below the 
modal class interval, and 37 frequencies lie above the modal class 


interval. Hence the mode is lt 20871 = 
equal to 4 + 37 4-36 (1) 24.8. 
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The geometric mean is the nth root of the product of n variables 
X. Thus, the geometric mean of 5, 8, and 25 is the cube root of 
5х8 х 25 = 10. The geometric mean may also be defined 
as the antilogarithm of the arithmetic mean of the logarithms 
of the variables X, t.e., 


ZlogX 
log G.M. = ART (3) 
The harmonie mean is the reciprocal of the average of the 
reciprocals of а variable magnitude Xi, Xe, . . . ; X,, thus: 
7 
HM. e. (4) 


E» 1 
x 

Moments: In statistics the term ‘‘moment” has been taken 
over from physics. In physics, moment is a measure of a force 
with respect to its tendency to produce rotation. The strength 
of the tendency depends on the amount of force and the distance 
from the origin of the point at which the force is applied. If the 
arithmetic mean is taken as the origin in a frequency distribution 
and the frequencies in each class interval (Fi, Fs, . . . , Fn) are 
taken as the forces, at distances %1, Va, . + +» Un the moments 
(more exactly moment coefficients since the physieal moments 
are divided by №) are defined as follows: 


ZFx 
= N 
=F x? 
из = N 
ZFz 5 
из = N (5) 
>Ёх" 
Ha = М 


in which a = X — Х. : 
The Greek letter mu (и) is used to describe the moments about 
When the deviations are measured about 


ап arbitrary origin, instead of about the arithmetie mean, 
the symbol used to represent the moments is the Greek letter 
nu (v). Generally speaking, it is more convenient first to 
calculate the moments about an arbitrary origin and by the use 


the arithmetic mean. 
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of the following equations to obtain the moments about the mean: 


u—n-»n-0 

из = v — vj (6) 
из = уз — Зузу + 25i 

Ha = v, — 4узь, + Gvavî — 3v1 


By the use of Sheppard's correction, из and u4 must be adjusted 
for errors involved in the use of mid-points of class intervals as 
the X's in the process of calculation." When it is desired to use а 
symbol for а population parameter in the present text, и 8 
printed in boldfaced type (u). For the most part, statisticians 
deal with statistics and speculate about parameters. Seldom 
is it possible to specify the value of a parameter. But it is often 
possible to make a maximum likelihood estimate of a population 
parameter; the symbol for this is the symbol for the corresponding 
statistic, with a breve (~) above it. Thus the maximum likeli- 
hood estimates of the population moments are represented by 
й, lo, og en. 

Types of Frequency Distribution. 
because it is possible to е 
that will distinguish ty 
measures used by Karl P. 
distributions are the m 
Certain functions of th 
“betas.” The first tw 


The moments are important 
aleulate from them precise measures 
pes of frequency distributions. The 
earson to distinguish types of frequency 
oments and functions of the moments.? 
€ moments derived from them are called 
о are defined as follows: 


в, = 
(7) 
Ba = H 


0 and 8, —3. The 


of the normal type. 


! Cf. pp. 12, 132-134. 
Tor the purpose of remo 
brought about by using 
Methods of Statistical A 

? Cf. pp. 134-137. 


It should be noted that Sheppard's correction is 
ving a bias; № does not 


р correct for inaccuracies 
class intervals that are too large. Goutprn, С. H., 
nalysis (1939), p. 15. : 
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The value of 8, is related to the skewness of a curve, t.e., 
whether or not the mode of the distribution is greater or less than 
the mean. If the mode is less than the mean, из is a plus quantity 
and the distribution is positively skewed; if the mode is greater 
than the mean, из is a minus quantity and the distribution is 
negatively skewed. "The square root of Bı, with the sign of the 
third moment, is often taken as a measure of skewness.! 

The value of 8» is related to whether the frequency curve is flat 
topped or peaked. When it is flat topped, the shoulders of the 
curve are filled out and the tails depleted. When peaked, the 
frequency curve is higher at the center and the tails are also 
higher. The broad-shouldered frequency curve is said to 
be "platykurtie," and the narrow, or peaked, one is called 
"leptokurtic." The statistic 8» is said, therefore, “to measure 
kurtosis.” If 82 is more than 3, the frequency curve is a peaked 
one; if 8, is less than 3, the frequency curve is а broad-shouldered 
one, For the normal curve, % = 3. 

Because of their theoretical advantages R. A. Fisher? has 
suggested the use of certain Ё and g statistics instead of the 
moment and В statistics. These are defined as follows: 


diss Ух 
N 
kim 2 
N-—1 (8) 
kg = М 
(N — ПМ – 2) 
N 4 Na Dyf | 
ЖЕ N-+1)22!—3 (Za?) 
| way m © В 
und 
"EE 
„= = 
vit 
ks (9) 
9° = p 


1 For more complete discussion of skewness and of the normal frequency 


curve, see Smith and Duncan, op. cil., РР. 226-230, 263-267, 285-306. 
2 Statistical Methods for Research Workers, Appendix B to Chap. III. 
Cf. Fisupr, В. А. Statistical Methods for Research Workers, Chap. III, 
Appendix on Technical Notation and Formulae (1932), р. 74; and GourpEN, 


C. H., op. cit., p. 29. 
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For large values of N, kı, ko, and k; are practically the same as 
#1, из, and из; and k, equals approximately из — Зи$. 
It is necessary to apply Sheppard's correction to the second and 
fourth k statistics, as follows: 
ke = ke (uncorrected) — 4 
ks = k, (uncorrected) + 414 


Also, for large values of N, gi is practically equal to +/@;, and 
g» equals approximately 8» — 3. Thus, it is readily seen that 
gi relates to the measurement of skewness and g 
measurement of kurtosis. 

Standard Deviation and the Variance. From the second 
moment about the arithmetic mean a measure of the dispersion 
of the frequency distribution is obtained. The second moment 


itself is called the “variance.” The square root of the second 
moment is called the “standard deviation.” 


2 relates to the 


с = Vu (10) 

The standard deviation gives a measure of the dispersion of 
the frequency distribution that for most purposes is preferable 
to the average deviation described below. The standard devia- 
tion of a normal frequency distribution, measured above and 
below the arithmetic mean, includes about 68 per cent of the 
cases. Twice the standard deviation measured above and below 
the arithmetic mean of a normal frequency distribution includes 
- When a normal distribution of 
ded as a distribution of proba- 
ability of a case falling beyond the 
frequeney distribution is approxi- 
a case falling beyond X — 90 is 
n а case falling beyond X + 20 is 
СЕ in sampling theory 
The Average deviation js 
minimum value when deviations are 
- To compute the average deviation 
8 of X from the 
ions, and divide 


a measure 


_ F(X Wi 
паана 2 (11) 
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The average deviation is less affected by extreme deviations 
than the more popular standard deviation, and for this reason 
it probably has greater sampling reliability from extremely 
leptokurtic (peaked) populations. 

Range. "The absolute range of a frequency distribution is the 
difference between the highest and lowest values of the distribu- 
tion. 'The relative range is this difference divided by the 
standard deviation. 

Bivariate and Multivariate Distributions. In the univariate 
frequency distributions discussed in the preceding sections, the 
data were classified according to a single characteristic. In 
TABLE 3.—A BIVARIATE FREQUENCY DISTRIBUTION OF 81 Mount HOLYOKE 


FRESHMEN ACCORDING TO THEIR GRADES IN First- (Хз) AND 
Ѕесохр- (X;) SEMESTER ENGLISH 


S ЖҮ ӨЙ Кы ИН Pi 

ея 100- mm 160-/180-,200-220-.240- 260-/280-| F 
EE | Ва a 
60- | 1 | 1 
80- | | | 0 
100- 2 1 | 3 
190 | | | я 
140- 1 1 2 
160- mr | 9 
180- 2|4|2 8 
200- 8 | 4) xy 2 | 16 
220- a 2| 4|.7| 4 17 
240- | | | | $| -*| €] i 13 
260- | | | | 1| 4] 4 9 
P ا‎ | Е | 3 
me oP аш г 
r falof[il7|s5|8 9 |18| 18| 5 |9 |81 
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bivariate or multivariate distributions, data are pig ке 
according to two or more characteristics. р Table 3 is an illus 
tration of a bivariate frequency distribution. The frequency 
distribution showing grades of the 81 Mount Holyoke pesce 
in second-semester English, shown in the total column at the 


X X25204.) 
320 


300 


280 


Xf 7425700 6322X. 
012711253 


pe 7-7 = Progression 
of the Xe 


80 
60 80 


AX; 
100 120 140 160 180 200 220 240 260 280 300 ^ 


Fic. 5.—Progression of the means of X, with changes in Xs. 


right, under the caption F 
group of freshmen did in it 
8 students having second- 
row 7 of Table 3 shows th. 
140 and 160, 4 had first- 
and 2 had first- 


; 15 cross classified to show how each 
s first-semester English. Thus of the 
semester grades between 180 and 200, 
at 2 had first-semester grades between 
semester grades between 160 and 180, 
semester grades between 180 and 200. This is 2 
small univariate frequency distribution of the group of students 


who had grades between 180 and 200 in their second-semester 
course. In Table 3 there ате 13 rows and 12 columns, of which 
11 (in both instances) contain univariate frequency distributions. 
Since there are 11 subgroups of 11 groups, there are altogether 
121 classes, represented by 121 squares or cells in the table, of 
which 28 contain frequencies. 
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The characteristics of a bivariate frequency distribution can 
be described by various statistics. Many of these are the same 
as the statistics employed in the description of a univariate fre- 
quency distribution, and some are new. Thus, the central 
tendency of one of the two variables may be measured by its 


X Х,=2041 


320 5’ 


300 


280 .— 


260 


240 
X,72/24 


220 


200 


180 
X52 75.548 09642X, 


05, =+ 2102 


о- -o- -Progression 
offhe X. 


160 


140 


120 


X; 
60 80 100 100 140 160 180 200 220 240 260 280 300° ? 
Fig. 6.— Progression of the means of Xs with changes in Xi. 


Mean, or its mode, or its median. Similarly, the dispersion of 
this variable may be measured by its range, its standard devia- 
tion, its average deviation, or Из quartile deviation; and its 
Skewness and kurtosis may be measured by В: and fs, respec- 
tively. The same is true of the other variable and of the numer- 
Ous univariate frequency distributions that make up the details 
of a single bivariate distribution, i.e., each frequency distribution 
of the rows and columns. 

Progressions of Means. If the data are grouped in the form 
9f a bivariate scatter diagram such as Table 3, one way to meas- 
ure the association between the two variables is to compute the 
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mean values of one variable for various values of the Ed he 

the means of the rows and the means of the columns of Ta | е ne 

The means of the columns show how the X; variable xd 
change, on the average, with changes in X»; and the means o 

rows show how the X» variable tends to change, on the 

X, 

320 


average, 


Х,=2041 


300 


Ху=47579/+08322%, 


ШУ 80 00 120 140 160 
та. 7.—' The fitting of the line of regression of 
Squares (vertical deviation: 
with changes in X 1. 
shown, respectively, in 
nected by a series of st 
regression.” 


Lines of Regression. 


X, 
180 200 220 240 260 280 300%? 
Е 


Xi on X: by the method of least 
s minimized). 


These means have been computed and 
Figs. 5 and 6, in 


Which the means are con- 
raight lines; these are called “polygons of 


ing hypothesis- 
rease in X» of one 

unit always produces an i i » Say, b units, b being 2 
affecting Ху, all the 
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all of them would be on the line representing the equation 
Xi = 47.58 + .8322X; shown in Fig. 7. 

If, however, other forces also affect Xi, causing it to be higher 
or lower than the value expected from its association with X», 
the actual values of the means would not fall on the straight line 


X, X5=2041 


320 ——— 


300 Г 
280 

260 | 
240 
220 X,=2/74 


200 
| 


IX3=-5.548+0.9642X, 


120 

D. 

0 

80 

6 ا ت ف‎ - X» 
60 80 100 120 140 160 180 200 220 240 260 280 300 


Fro, 8.—The fitting of the line of regression of Хз on Xx by the method of least 
squares (horizontal deviations minimized). 


but would be scattered about the line in the manner shown in 
Fig. 7. According to such a hypothesis, a straight line fitted 
to the data should give the law of relationship between X, and X», 
and the scatter about the line should give the deviation from this 
line caused by the other factors affecting X1. 

A similar view could be taken of the variation in the mean 
Value of X, with changes in X; and would justify drawing a 
Straight line to show the law of relationship between X» and X;. 


This is illustrated in Fig. 8. 
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The lines that are derived to show the relationship between 
the mean value of one variable and the value of another are 
called “lines of regression," following Francis Galton's termi- 
nology, who used this term in his original study of the relationship 
between the heights of children and the heights of their parents. 
A line of regression of one variable on another is to be interpreted 
as indicating the values of the first, the dependent variable, that 
would be obtained for various values of the second, the inde- 
pendent variable, if no other forces were affecting the dependent 
variable. | 

First-order Standard Deviations. In dealing with bivariate 
and multivariate frequency distributions, it becomes necessary 
to differentiate among zero-order standard deviations, first-order 
standard deviations, second-order standard deviations, etc. 
The zero-order standard deviations are standard deviations of 
the original variables. The first-order, second-order, and higher- 
order standard deviations are standard deviations of variation 
from lines and planes of regression. 

In the case of the monovaria 
tiveness of the mean de 
scattered around this 
by the zero-order sta 
of a line of regression 
tiveness of the line as 
the two vari 


te distribution, the representa- 
pended upon how closely the cases were 


mean value. This scatter was measured 
ndard deviation. 


ша bivariate 


'The repre- 
of regression depicting the equation 
1 = 47.58 + -8322X;, used in Fig. 7, may be indicated by the 
Scatter of cases above and below it. A measure of this scatter 


deviation of the vertical deviations (of 
not of means) from the line. The standard 
the line X; = —5 54g + 9642X, shown in 
Fig. 8, would be the standard deviation of the horizontal devia- 
tions from that line, made by th 


€ scatter of cases (not by the 
scatter of means) about the line. These two Standard deviations 
are called “first-order standard deviations,” 

The first-order standard deviations will always be less than 
the zero-order standard deviations, for the part of the variation 


‘Cf. p. 19 for method used to derive the equations for these lines, 
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represented by the line of regression has been eliminated by 
taking the deviations from the line. 

If the equation for the line of regression is X1 = ais + bisXs, 
the first-order standard deviations are found as follows: 


Nota = BX? = мах: — byd Xa (12) 


and 
Noà; = УХ} — asiZXs — ban XX: (12) 


Sometimes these first-order standard deviations are called 
"standard errors of estimate" since they indicate the error 
involved in using the line of regression as an estimate of the 
dependent variable. 

The Pearsonian Coefficient of Correlation. 'The progression 
of the means and the lines of regression described above are con- 
cerned with depicting the “law of relationship" between the two 
Variables. They give the average value of one variable asso- 
Ciated with given values of the other variable and show how 
these average values tend to change in unison with the other 
variable. Another statistic, called the **Pearsonian coefficient 
of correlation," aims to measure the degree of association between 
the two variables. "This coefficient of correlation is found by the 
equation 

Exito р 
тю = Nos (13) 
in which x; and хо refer to deviations from the means and N to the 
number of pairs of cases. 

Relationship between т and the First-order Standard Deviation. 
If the variables are measured from their mean values, Eq. (12) 

ecomes 


Noa = Ex} — ВХ since Sx, = 0 


If the value of bis is determined by the method of least squares, 
as Eq. (12) and (12/) ргеѕирроѕе,! 


since Late = No2 


and the value of Na, may be written 
Nol, = No — Neiri» 


“Cf. Змтги and Duncan, op. cit., pp. 331-333. 
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so that 


and, finally, — 
туз = c1 VI — rj (14) 


In the same manner, 
721 = оз Al — ri, (14^) 


"These equations, it is to be noted, may be put in the following 
form: 


re = 1 — 9 (18) 
1 

ry = 1 es (15) 
E 


From this form [Eq. (15)] it is readily seen that r is closelY 
related to the first-order variance, i.e., to the scatter about the 
line of regression. If the scatter about the line of regression is à 
small percentage of the total scatter, this signifies that the line 
of regression itself accounts for a large part of the variation in 
Хь and т is high. If the scatter (the first-order standard 
deviation) about the line of regression is a large percentage of the 
total variation in the dependent variation, this signifies that only 
à small part of the variation is shown in the line of regression, anc 
712 is low. That is, the better the line of regression fits the data 
the higher the value of r, and vice versa. The Pearsonian coeffi- 
cient of correlation is thus a measure of the goodness of fit of the 
lines of regression, 

The Pearsonian Coefficient of Correlation and the Breakup of 
Variance. For every point on a bivariate scatter diagram suc? 
as Fig. 7, there is a corresponding point on the line of regressio? 
of X; and Xo. Geometrically, the former is obtained by pro 
jecting the point vertieally onto the line of regression. This i? 
ilustrated by points P and P' in Fig. 7. Algebraically, the 
X; coordinate of a point on the line of regression is found bY 
substituting the given values of X» in the regression equation 
Xi = aia + bX, or, if the X's are both in terms of deviation? 


" р ë 
from their respective means, z/ = Tio = Xe. 

2 
When the variables are 


measured from their respective mea? 
values, the mean of the ү 


arious values of т» is zero. Hence the 
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mean of the corresponding values of а; is zero also; and the 
standard deviation of these z/ values is accordingly as follows: 

5 — 208) y 03 Жай РИ 

асаа i do o г = 71301 

This describes the part of the variance in X; that is depicted by 
the line of regression; and it is to be noted that Eq. (15) may 
consequently be written j 


RS NS 2 
сїз = 01 — Gy 
or, in other words, 
2 
of = oz, + ois (16) 
в; = oz + 93.1 (16^) 


Equation (16) says that the total variance in X; values is equal 
to the variance of the corresponding points on the line of regres- 
sion plus the variance of the deviations from these points. 
Another way of looking at this is that the total variance in Xi 
is made up of two parts, one consisting of the variance due to its 
(о°.,) as represented by the line of regression, 


association with X» 
variance of X; due to its association 


the other representing the 
with factors independent of X» (1). 
It was also found just above that o?z, = 11.01; hence, 


os! 
=e (17) 
2 oa r 
cm (17) 


These equations shed further light on the meaning of r, which is 


consistent with the explanation of the significance of Eq. (16). 
Equation (17) shows that 72, measures the proportion of the total 
variance in X; that is due to its association with Xs. It also 
measures the proportion of the total variance in X» that is due to 
its association with Xı, as indicated in Eq. (17’). 

The Correlation Ratio. The correlation ratio is a measure of 
correlation designed to be used in cases where correlation between 
the two variables is nonlinear. In such instances it is not 
appropriate to compute a straight line of regression; but an 
analysis similar to that outlined above may be applied. This 
latter analysis makes use of the polygon of regression rather than 


Accession No. f 
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the line of regression; 4e, № makes use of the means of the 
columns and the means of the rows. The first-order variance 
is calculated by measuring the deviations in each column from 
its column mean, summing them, and dividing by № ; this is 
the variance in X, due to association with factors independent 
of X; The amount of variance in X 1 explained by association 
or correlation with X; is then the variance in the column means 
from their mean, which is the mean of the whole distribution, 
namely, Ху. The ratio of either of these variances to the total 
variance can be used to measure nonlinear correlation, and the 
ratio is called the “correlation ratio," The symbol used is the 


Greek lower-case letter eta (n). While it has been seen that 


Тїз = Тор, it is not true that nıe equals 721. 


The correlation ratios may be defined by the following 
equations: 


(18) 


(187) 


PARTI 
General Theory of Frequency Curves 


CHAPTER II 
PROBABILITY AND THE PROBABILITY CALCULUS 


The theory of frequency curves is in large measure a special 
Application of the theory of probability. Ап understanding of 
Probability and the probability caleulus is therefore essential 
for a study of the theory of frequency curves. Probability theory 
11 turn is principally based on combinatorial analysis. This 
chapter on probability and the probability calculus will accord- 
ingly begin with a review of permutations and combinations. 

' Permutations and Combinations. Permutations. А permuta- 
tion is an arrangement. If there are N things, they may be 
arranged in №! different ways; for the first selection may be 
made in № ways, the second selection in N — 1 ways, the third 
InN —Q ways, etc. Hence the total number of arrangements, 
or Permutations, that may be made of N things is 

ММ — 1)(N — 2). -1=М! 
For example, the number of different permutations of five things 
8 5! = 5 xX 4 X3 X 2 X1 = 120. 

Sometimes in forming permutations only a fraction of the total 
number of objects can be selected at one time. Thus an arrange- 
Ment is to consist of three objects, but there are five objects from 
Which to choose. In this instance the total number of different 
arrangements that can be made by the selection of three objects 
from the group of five will equal 5 X 4 X 3 — 60, for there are 
five Ways of selecting the first object, four ways of selecting the 
Second, and three ways of selecting the third. 

П general, the number of permutations of N things taken r 
at а time is 


PY = NN —1)(N —2)--+ (N =7 +1) = gg 00 
23 
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Combinations. Some of the permutations of Eq. (1) will be 
constituted alike; they will merely be different arrangements of 
the same combination of things. For example, suppose the 
five objects are the letters a, b, c, d, е. If permutations are made 
of these five letters three at а time, two of these permutations 
would be abc and acb, which are the same combination of the 
letters a, b, and c arranged in different order. To find the num- 
ber of different combinations of three letters each that may be 
made from the group of five letters, it is necessary to allow for the 
number of permutations that can be made from any one com- 
bination. In the first section it was found that a given set of N 
objects can be arranged in N! ways. Hence every combination 
of three letters can be made to yield 3! = 3 X 2 x 1 = 6 per- 
mutations without changing the combination. It follows that 
the total number of combinations of three letters each that may 
be made from a group of five letters is equal to the total 
number of permutations of five things taken three at a time 
[N!/(N — т)! = 5!/2!] divided by the number of permutations of 
three things taken all at a time (r! = 3!); that is, C = 51/2181. 


In general, the number of combinations of N things taken Г 
at a time equals 


гы. Їй 
И = ail (2) 


The last equation may also be looked upon as the number of 
different combinations that may be made of N things by put- 
ting r of them in one category and N — r in another. For 
obviously the number of different combinations that may be 
made of 10 men by putting 3 of them on a committee and leaving 
7 of them off is the same as the number of different committees 
of 3 that may be picked from 10 men. 

This new way of looking at the problem is especially helpful 
when more than two categories are involved. Suppose, for 
example, that three committees are to be chosen from 10 шеп, 
say a finance committee consisting of 3 men, a production com- 
mittee consisting of 5 men, and a personnel committee consisting 
of 2 men. If each man is to serve on a committee, but no man 
is to serve on more than one committee, how many different 
committees of this kind could be formed from the 10 men? Here 
it is a question of how many different combinations can be made 


PROBABILITY AND THE PROBABILITY CALCULUS 25 


of 10 things, 3 of them to be placed in one category, 5 in another, 
and 2 in another. 

The answer to this broader question is obtained in the same 
way as the solution of the two-category case. The total number 
of different ways of picking 10 men from 10 men is10! But not 
all these different ways of picking 10 men will lead to differently 
constituted committees. For any one set of committees could 
be picked in (3!) (5!) (2!) different ways without changing the 
make-up of the committees. Thus, if the finance committee 
consisted of Mr. A, Mr. G, and Mr. J, the committee could be 
Picked in that order, or in the order A, J, G, or the order G, A, J, 
or G, J, A, or J, A, G, or finally J, G, A. But each of these 3! 
Ways of picking the finance committee could be combined with 
the 5! ways of picking the production committee, which would 
make (3!) (5!) different ways of picking these two committees. 
Finally, each of these (3!) (5!) ways of picking the finance and 
Production committees could be combined with the 2! ways of 
Picking the personnel committee, making a total of (3!) (5!) (2!) 
different ways of selecting these three committees without chang- 
ing their ultimate constituency. It follows, therefore, that the 
total number of different committees that can be picked is equal 
to 101/315121, In general, the number of different combinations 
that may be made of N things by putting №, of them in one 
Category, №» of them in another category, N; of them in а third 
Category, and N, of them in a kth category, where 


is Nit Net Neto ta 
N! 

CX ny, кн, = Ми --. Ми Ми (3) 

„The Binomial Expansion. An important use of the com- 

Matorial analysis is in the derivation of & formula for the 

expansion of the binomial (а + b)”. The expression (a + b)“ 

Means the multiplication of (а + b) by itself N times. Terms 

of the product are thus formed by multiplying the a’s of a various 

number of factors by the b’s of the remaining number of factors. 

e product ab- will therefore appear C¥ times, since this is 

e number of different ways in which r of the a’s can be selected 

Tom N factors, no consideration being given to the order of 
Selection, j 
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The equation for the expansion of the binomial (а + b)" is 
accordingly 


(a += Сҳа* + Су лах +... + Уж T: 
This is known as the “binomial expansion." As will be seen in 
the next chapter, there is а frequency distribution whose rela- 
tive frequencies are computed in the same way as the terms 
of the binomial expansion. It is consequently known as the 
“binomial distribution." 

The Multinomial Expansion. An argument similar to that 
just outlined shows that the general term of the expansion of the 
multinomial (Xi + X: +X; +... + Xa)” is given by Eq. 
(3). "There is also a frequency distribution whose relative e 
quencies are computed in the same way as the terms of this 
multinomial expansion, and it is hence called a “multinomial 
distribution." 

Mathematical Probability. Definition. И, in 
t objects, m possess a given property 
property, the probability of an object 
property is m/t, or the relative frequency of these objects in the 


set. The word “object” may include events that have the 
property of occurrin 


of being true, as well a 
illustrate probability 
of playing cards containin 


Probability of a heart is 12 = 1. This is also the probability 


an ordinary deck of cards. In 
is finite. The definition d 
is equally valid, however, for 


а given set of 
and п do not possess this 
of this set having the given 


асе in а pinochle deck, sin 

8 are aces, while an ordi 

are aces. If a probabilit 
1 See Chap. XIII. 


ains 52 cards of which 4 
Y of an ace is defined with reference to an 
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ordinary deck, it must not in subsequent stages of the analysis 
be taken as referring to а pinochle deck. This should be clear 
to anyone; yet the pitfalls of such.a shift lie athwart the path 
to more intangible analysis, а path that is strewn with the 
intellectual bones of the unwary. 

In this connection it is to be noted that the probability of a 
heart in an ordinary deck, say, is not necessarily the same as the 
probability of drawing a heart from an ordinary deck. The first 
probability refers to a set of 52 cards, 13 of which are hearts, so 
that the probability of а heart in an ordinary deck is clearly 
$$ — 1. The second probability refers to a set of drawings from 
an ordinary deck. The probability of a heart in this set of 
drawings is the relative frequency of the number of hearts in the 
total set of cards drawn. ‘The precise value of this second 
probability cannot be given until the number of cards drawn and 
the number of hearts among them have been counted. It may 
be 4, or it may be some other value. 

Law of Large Numbers. That the probability of drawing a 
heart from an ordinary deck is commonly said to be + is the out- 
Come of experience with certain kinds of mass phenomena. 16 
has been found that if cards are drawn at random from an ordi- 
nary deck, the card being replaced and the deck reshuffled after 
each draw, the ratio of the number of hearts to the total number 
of cards drawn tends to approximate + whenever the number of 
drawings is large. This experience with mass random! phenom- 
ena is generalized in the law of large numbers. 

The law of large numbers says that, when a large number of 
random events is involved, it is possible to predict with reasonable 
accuracy the relative frequency of recurrence of a particular 
event by calculating a certain mathematical probability ascer- 
tainable from a carefully defined probability set. This law is 
the link between abstract calculations of probability and the 
Prediction of relative frequencies of events in real life. With the 
assumption of its validity, the principal problem in most cases 

comes that of finding the mathematical model that is the 
Correct one for the particular phenomenon in question. 

The probabilities in some sets are unknown but have been 

empirically approximated. Thus, if the empirically determined 


1р к ; 54-16 
For a discussion of “randomness” see pp. 154-162. 
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probability of a man dying at age fifty is used in practice 
insurance companies as a method of predicting, experience shows 
that these empirically determined probabilities yield good pre 
dictions if a large enough number of men is involved. W ith 
large masses of data, therefore, empirically determined probabili- 
ties may be used in the same way as known probabilities of a given 
set, as above illustrated, are used to make predictions. 
Probability Distributions. Definitions. Any ordinary fre- 
quency distribution may be expressed in the form of a probability 
distribution by describing the frequencies as percentages of the 
total number of cases. A probability distribution can therefore 
þe discrete or continuous. A continuous frequency curve that 
represents the distribution of relative frequency of an infinite 
population of cases is also a probability curve. Accordingly, all 
the measures of the various characteristics of frequency distribu- 
tions, which have been summarized in the preceding chapter, 
apply to probability distributions; thus a probability distribution 
has a mean, a standard deviation, a coefficient of skewness, and & 
coefficient of kurtosis, like any frequency distribution. There 
are also bivariate and multivariate distributions of probability, 
corresponding to bivariate and multivariate distributions. 
Probability Equations. Probability equations may be written 
in two ways. If the distribution is discrete, the probability of 
the attribute X may be written simply Y = (X). The prob- 
ability in this case is represented by the height of the ordinate Y 
at the abscissa point X. If the distribution is continuous, the 
equation Y = e(X) serves as an algebraic description of the curve 
but it is not à true measure of the probability. 
sents the height of the curve a 
For continuous distributions the proper form for representing 
probability is q(F/N) = Ф(Х) dX, or dP = e(X) dX. In this 
form the probability, or relative frequency, of a case lying between 
X and X + dX is expressed as a function of the attribute X. А 
probability, or frequency, curve is the limit approached by an 
area histogram as the class interval is made infinitesimally small. 


е(Х)аХ or dP = (X) dX 
interval (of size dX) is made 
infinitesimally small, the area п 


i і nder the curve for any class 
interval [that is, d(F/N) or dP] is approximately equal to the 
base is dX and whose height 


It merely repre- 
t an abscissa point X. 
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is the ordinate of the curve [e(X)] at some arbitrary value of X 
within the interval. 

As an illustration, consider the normal probability curve. The 
algebraic equation for the normal curve is 


Y = — 5-6 м (5) 


This expresses the ordinate Y simply as a function of the abscissa 
X. The probability, however, of a normal variate lying between 
X and X + dX is given by the equation 


1 —(x-X)? ax 
= 2а? 2 r 
dP == (5) 


In both cases е(= 2.71834-) is the base of the Napierian system 
of logarithms, X is the mean of the distribution, апа е its standard 
deviation, 

The Probability Calculus. The Addition Theorem. If the 
Attributes of a given probability set are Xy Xe, . . . X, (repre- 
Senting either qualitative or quantitative characteristics) and 
their probabilities are pı, Ps - - + › Ps then the probability of 

, Xs, or Хз, say, i.e., the attribute of being any one of these 
x ’s, is Dı + ps + pa. If the variation in attributes within the 
Probability set is continuous and if the distribution of probability 
is described by an equation such as dP = ¢(X) dX, the prob- 
ability of an attribute within any one of a number of small 
ranges dX whose sum constitutes the range X, to Хз is given by 
X. xS 
3 АР = У e(X) dX, or, in the symbolism of the integral calculus, 
x 
Јар = [5 ocx) ax 

x, = | е0) ах. 

The addition theorem is stated briefly as follows: The prob- 
ability of either one of two mutually exclusive events (attributes) 
15 the sum of their individual probabilities. 

In using this theorem care must be taken to interpret correctly 

he term “mutually exclusive.” What the term signifies is that 
the addition theorem applies only to probabilities of one and the 
Same probability set. For the attributes of a probability set 
are by definition mutually exclusive. Hence the effect of this 
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term is to warn the student to define carefully his probability 
set at the start. It is to be noted that, while the attributes of a 
given probability set are mutually exclusive, not all mutually 
exclusive attributes are members of the same probability set." 

The Multiplication Theorem. 'The multiplication theorem 
pertains to the calculation of a probability of a derived, or second- 
order, probability set from the probabilities of two or more first- 
order sets. In deriving the multiplication theorem two cases 
are distinguished, one pertaining to independent probabilities, 
the other to dependent probabilities. Consider first the case 
of independent probabilities. 

The number of dots on the faces of a die form a set of mutually 
exclusive attributes the probability of each of which is $. Two 
dice form two such probability sets. Each of these may be 
viewed as a first-order probability set. Consider now the attri- 
butes given by the sum of two faces of a pair of dice. If there 
is no restriction on the way in which the face of one die can be 
paired with the face of the other die (the condition of independ- 
ence), then each face of one die can be paired w 
of the other die and the sum of these faces ma; 
values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. 
formed, there will be 6 x 6 — 36 pairs. 
yield the sum 2, viz., the pair 1 and 1. Hence, in this derived, 
or second-order, probability set of 36 combinations, the prob- 
ability of a sum having the value 2 is gis. "This, however, is equal 
to the product of the probability of a 1 on one die times the 
probability of a 1 on the other die, that is, $ X = зу, This 
illustrates the multiplication theorem for independent probabili- 
ties. It says that, if the probability of attribute В of set II is 
independent of the probability of attribute A of set I, then the 
Joint probability of A and B in the probability set formed by 
combining each attribute of set I with every attribute of set II 
is equal to the product of the probability of A in set I times the 
probability of B in set II. Succinetly, if P(B) is independent of 
А, P(A,B) = P(A) x P(B). 

If the probability of B is dependent in some way on the occur- 
rence of the attribute A, then the multiplication theorem for 


ith every face 
y take on the 
Tf all possible pairs are 
Only one of these will 


1 For further discussion, see 


arth Змітн and Duncan, Elementary Statistics 
and Applications, p. 269. 
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independent probabilities cannot be applied. In its place must 
be put the multiplication theorem for dependent probabilities. 
This says that the joint probabilitity of A and В is equal to the 
probability of A times the probability of B given А. Succinetly, 
P(A,B) = P(A) x P(A/B).! 


! For further discussion, sce ibid., pp. 269-274. 


CHAPTER III 


THE SYMMETRICAL BINOMIAL DISTRIBUTION 
AND THE NORMAL CURVE 


The preceding chapter has provided the tools that are now to 
be employed in shaping a general theory of frequency curves. 
This chapter is the first step in the development of that theory. 

The argument will begin with a study of a simple problem 
in combinatorial analysis. The basic data will be 10 coins. 
These will be marked with a head on one side and a tail on the 
other. The problem will be to determine the relative frequencies 
or probabilities of various types of combinations in the whole 
set of combinations that might be made from various arrange- 
ments of the 10 coins. Immediate attention will center on the 
form of this derived distribution of probability, and exact and 
approximate equations will be determined. Ultimately, con- 
sideration will be given to how this combinatorial analysis may 
m some of the frequency distributions that appear in real 

ife. 

The Symmetrical Binomial Distribution. Derivation. Sup- 
pose there are 10 coins all exactly alike and each having a head 
and a tail. Each of these coins represents an elementary prob- 
ability set having two attributes, a head and a tail. Since there 
is only one head and one tail on a coin, the probability of each 
is $. 

The 10 coins may be arranged in various ways so as to form 
different combinations of heads and tails. The various possible 
combinations are as follows: no heads, 10 tails; 1 head, 9 tails; 2 
heads, 8 tails; 3 heads, 7 tails; 4 heads, 6 tails; 5 heads, 5 tails; Ó 
heads, 4 tails; 7 heads, 3 tails; 8 heads, 2 tails; 9 heads, 1 tail; and 
10 heads, no tails. The probability of the combination no heads, 
10 tails, is the product $-1-:3-3-1-1-31-4.1.4 = (8), 
The probability of the particular combination HTTTTTTTTT 
is also (3)!^; but since there are nine other ways in which combina- 
tions containing 1 head and 9 tails can be formed, the total 

32 
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probability of a combination containing 1 head and 9 tails is 
10G). In general, the total probability of a combination con- 
taining r heads and 10 — r tails is C1 ($), which is the formula 
for the (r + 1)th term in the expansion of the binomial (4 + 4)". 
The probabilities of the various combinations of heads and tails 
are shown in Table 4. 

TABLE 4.— PROBABILITIES OF VARIOUS COMBINATIONS or HEADS AND Тлиз 

лмоха 10 COINS 


Combinations of 


heads and tails s 


Probability 


H T 


9 19 1,024 


10 


For N coins, the probability of а combination having № heads 


and № — N, tails is as follows: 
à N! "Ya 
"e wm | 
| P(N) = МКМ = xj) а) 
or, if No = N — Ni, 


м AY 
РМ.) = үү! 9 ® 
‘See р. 24. 
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Equation (2) is thus the general equation for what is called the 
“symmetrical binomial distribution."! 

Characteristics. Mathematical analysis shows that in gen- 
eral the symmetrical binomial distribution has the following 
characteristics: 


= М 
X-3 
N 
= (3) 
Bi =0 
2 
Ве 86 


The variable, it will be noted, is N;, the number of heads. 

The Normal Curve. Derivation from Binomial Distribution. 
If 40 coins were used, the distribution of probability of N, heads, 
N — М, tails would be considerably more spread out than when 
10 сошз were tossed. In general, the equation в = 4/N/4 
indieates that the dispersion of the distribution increases in 
proportion to ће VN. If the horizontal scale is reduced, how- 
ever, and the vertical scale enlarged, in the same proportion in 
which the dispersion of the distribution is increased, then the 
effect of increasing N is to bring the ordinates of the distribution 
closer together and to raise them to the height of the original 
distribution. Under these conditions the tops of the ordinates 
tend to sketch out a smooth curve as N is increased. 

Equations (3) suggest that the curve approached as a limit by 
the symmetrical binomial as N increases is the normal prob- 
ability curve. For 6, = 0, and 8. = 3 — z approaches 3 as N 
approaches infinity. It can also be shown that if a line is drawn 
between two ordinates of the symmetrical binomial the ratio of 
the slope of this line to the average of the two ordinates, i.e., the 
relative slope of the binomial distribution at any mid-point, is 

1 The name follows from the fact that it is the equation for the general 

term of the expansion of (5 + 3)» (see pp. 25-26). 


2 These equations are derived in the Appendix to Chap. IV (pp. 65-67). 
It will be noticed that the parameters X, à etc., are in boldface type since 
they are population values and not sample statistics. This differentiation 


must be carefully watched in this volume. Cf. Smita and Duncan, Elemen- 
tary Statistics and. Applications, pp. 123-124, 316. } 
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the same as the relative slope of the normal curve that is traced 
out by the binomial ordinates.! Both these suggestions that the 
limit of the symmetrical binomial is a normal curve are borne out 
by strict mathematical analysis.) Thus the limit of the sym- 
metrical binomial is* 


1 4° 
ers E exp [^ A (4) 


ation from the mean value and equals 


Here x represents a devi 
Nec. 
2 

If z is set equal to z/à and if probabilities are measured by 

areas instead of ordinates, the equation becomes 

dP = v exp i 3 dz (5) 

which is the equation for a normal curve whose standard devia- 

tion is 1. Tt is subsequently referred to as the standard normal 

curve, If N is large, therefore, the probability that N, lies 

between Л, and № ean be found approximately by computing 


№! — N № — N 
2 1 2 ь А 
VNA and JNA and finding the area under the standard 


normal curve between these limits. The latter is easily accom- 
plished by reference to the normal probability table given in 


Table VI of the Appendix.“ 

Significance of the Symmetrica 
the Normal Curve. The practical significance of the symmetrical 
binomial distribution and the normal curve is that certain con- 
ditions in real life appear to produce frequency distributions that 
have this form. Consider first a real situation that very closely 
Parallels the combinatorial problem of the preceding sections. 
Let 10 unbiased coins (unbiased in the sense that they are 
Physically uniform) be tossed at random? a large number of times. 

1 See Appendix to Chap. IV, pp- 74-16. 

* Ibid., pp. 68-76. 

*The expression exp [2] is mer 
quently used in this text to facilit 

* For further discussion on how to use 
Pp. 120-123, 164-168. 


‚ ° Whether a method is random or not ha 
Intuition and experience with similar experiments. 


1 Binomial Distribution and 


ely another way of writing e7. It is fre- 
ate the printing of complicated equations. 
the normal probability table, see 


s to be determined largely by 
See pp. 154-162. 
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Experience with such experiments indicates that the relative 
frequencies with which 0, 1, 2, . . . › 10 heads appear will be 
approximately the same as the probabilities of the binomial 


distribution. "This is merely one instance of the law of large 
numbers referred to in Chap. II. 


Again, suppose a large number of flour bags, 
pounds at the start, are opened in succession an 
tity of flour is added or taken away in accordance with the follow- 
ing rule: When a bag is opened, 10 coins are tossed and an ounce 
of flour is added for each head that appears, and subtracted for 
each tail that appears. For this experiment the law of large 


numbers suggests that the outcome will be a set of bags varying 
in weight approximately as follows: 


each weighing 5 
d a certain quan- 


TABLE 5.—RELATIVE FREQUENCIES or Bacs or FLOUR or SPECIFIED 
WEIGHTS 
Weight of Relative 
Bag Frequency 
4, 
41Ь. боз, 02d 
10 
41b. 807. ioni 
415. 10 oz, Toi 
2 
4 Ib. 12 oz. ipi 
4 Ib. 14 oz, ipi 
5 
Ib. 4 
5 0 oz n 1 
51. 20%. о 
2 oz 1,021 
54b. 4 oz, S CIUS 
$o 1,024 
51b. боз, RLS 
1,024 
51b. 8 oz, 10, 
1,024 
5 Ib. 10 oz, Ear. 
1,024 
In other words, the distribution of weights will approximately 
d is a symmetrical binomial distribution with a mean 
weight of 5 pounds, and a standard deviati 2.5 
cunas ation of 9 V2.5 = 3.162 


If a much larger number of ¢ 


oins were t 
of flour added or subtracted per h ч mployed and the amoun 


ead or tail were made very 


БЕ 
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small, the distribution of weights would become practically con- 
tinuous and would form a normal curve. 

The two examples just given suggest “laboratory” experi- 
ments by which a symmetrical binomial distribution and, in the 
second case, a normal curve might be produced by real events. 
Certain conditions in everyday life that appear to parallel these 
laboratory experiments also produce symmetrical binomial dis- 
tributions and normal curves. Among & number of animals, for 
example, the biological conditions appear to be such that the 
chance of male offspring is equal to the chance of female offspring. 
Distributions of the number of males per families of given size 
therefore closely approximate the symmetrical binomial distribu- 
tion. Again, in many cases of physical measurement, there are a 
host of forces tending to cause slight positive and negative errors, 
with the result that the net error of measurement is generally 
distributed like a normal frequency curve." The heights of 
adult males of the same race, the heights of adult females of the 


Same race, grades of students, the durability of electric-light 


bulbs, and many other biological, psychological, and physical 
Variables are likewise normally distributed. : 

Summary of Conditions Leading to Symmetrical Distribution. 
The foregoing analysis suggests that, whenever the following 
conditions exist in real life, the data generated by these condi- 
lions will tend to be distributed in the form of a symmetrical 
binomial distribution and, if certain other conditions are also 
present, in the form of & normal curve. 

A. "The conditions giving rise to the symmetrical binomial dis- 
tribution may be stated as follows: | 

1. In the absence of certain “causes” of variation or in the 
event of a perfect balancing of their effects, the data assume a 


fixed central value (the 5 pounds of the flour illustration). 

2. Deviations from this central value result from certain 
causes of variation, the effect of any cause being either to add a 
fixed quantity to the data or to subtract the same quantity (to 
add or subtract 1 ounce of flour). А m 

3. The probability of a cause of variation producing a positive 


effect equals the probability of its producing а negative effect, 


1For a more extended discussion see Smith and Duncan, Elementary 


Statistics and Applications, рр. 294-295. 


38 GENERAL THEORY OF FREQUENCY CURVES 


that is, Р(+) = P(—) = } (the probability of a head equals 
the probability of a tail). 

4. The effects of all contributory causes of variation are of 
equal magnitude (each adds or subtracts 1 ounce of flour). 

5. The contributory causes are independent in their action. 
That is, the probability of a positive or negative contribution 
by any causal factor is independent of the previous contributions 
of other causal factors; the sets of deviations generated by the 
various causes are all independent. 

6. The total deviation of any element from its central value is 
the algebraic sum of the positive and negative contributions of 
the individual causal factors (the total amount of flour added or 
subtracted from a bag is the sum of the ounces added for each 
head tossed minus the ounces subtracted for each tail tossed). 

B. 1f, in addition to the above conditions, the following also 
exist, then the resulting distribution will tend to conform to the 
normal curve: 

7. 'The number of contributory causes is very large (a large 
number of coins, instead of only 10 coins, are tossed). 

8. The positive and negative contributions of each cause is 
very small (if .01 ounce is added or subtracted instead of 1 ounce). 

It is to be noted that, so far as the normal curve is concerned, 
not all these conditions are necessary for its generation. The 
foregoing conditions will produce it, but it can be shown that the 
normal curve may also occur when some of these conditions are 
absent. The normal curve will still be produced if conditions 
2 and 3 are relaxed so that a causal factor may affect the data in 
varying degree and with varying probabilities and also if condi- 
tion 4 is only approximately and not exactly true.! Under cer- 
tain conditions the requirement of independence (condition 5) 
may also be relaxed. 


The most important conditions for the normal curve are 6, 
7, and 8, and condition 4 in an 


(say, .001 ounce for the occurri 


1 See pp. 137-142, 


THE SYMMETRICAL BINOMIAL DISTRIBUTION 39 


— .004 for the occurrence of a four, еїс.), provided that the num- 
ber of dice thrown were very large and the amount added or sub- 
tracted per die were very small and of about the same order of 
magnitude from die to die. The normal curve is thus a more 
general phenomenon than the symmetrical binomial distribution. 
Mathematically, the normal eurve can be derived from a great 
variety of different assumptions.’ 

1Cf. Саџвек, EMANUEL, Theorie der Beobachtungsfehler, В. G. Teubner, 
Leipzig, 1891. 


CHAPTER IV 
THE PEARSONIAN SYSTEM OF FREQUENCY CURVES 


In the preceding chapter the conditions that would DOU 
the symmetrical binomial distribution and the normal curve were 


It was pointed out, however, 


er consideration was postponed 


until this chapter. This more extensive examination will now 


be undertaken. 


ASYMMETRICAL BINOMIAL DISTRIBUTION 
Derivation. Beginning in this section some of the more 


important conditions will be examined that lead to nonnormality 
in homogeneous data, The discussion will again begin with some 
problems in the realm 0 


Е combinatorial analysis. 
Suppose there are 1 


marked with an H and the fourth marked with а T. The prob- 


ability of an H on each prism is thus 1, and the probability of а 
T isi. This differs from the previ 


The particular 
tion of the probabilities 


as a “tail”), but th 5 ill be found to be 
different. For conveni i 


by the letters ASB Oe. ; 4. 

Analysis of the problem begins, once 
tion of the probabil 
sider the combinati 


again, with the determina- 
ity of а particular Combination, First con- 
on having no Hs, viz., 
ABCDEFG HIJ 
ETTET T 

40 
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Since the probability of a T on each prism is $ and since the face 
of each prism is selected independently of the others, the prob- 
ability of all prisms being T’s is. by the multiplication theorem 
for independent probabilities, 

fab ste dod Ш G)” 


Furthermore, the given combination is the only way in which no 
H’s can occur. Hence the probability of no H's is just 


iss 1 
9 = 1,018,576 
Consider next the combination 
ABCDEFGHIJ 
mTTTTTTTZIT 


This is a combination having but one H. Sinee the probability 
of A being an H is $ and the probability of each of the other 
prisms being T's is 4, the probability of this particular combina- 


tion is 
1 1 "H—3 3 19 
TET E Ча и GG) 
The same would be true of the combination 


ABCDEFGHIJ 
TEH NETE mu TI 


or in fact of any combination having butone H. Since there are 
10 such combinations altogether, the probability of any of these 
10 combinations is, by the addition theorem, 


10 (3) (1 ^ _ 00). 
4/\4/ ~ 1,048,576 
This, then, is the probability of one H. 


The following is a combination having but two H’s: 


ABCDEFGHIJ 
"HDI fe Р 


The probability of this particular combination is 
СЕСЕ GPE) 


This is also the probability of any particular combination having 
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but two H's; and since there are 45 such combinations, the prob- 
ability of any one of them is 


C) - тод 


This is the probability of two H's. 


In general, the probability of N, H's with 10 prisms is given 
by the equation 


10! SVP 
PUO = ari x4) Ө 


For №, = 0 to N, = 10, this equation yields the results shown 
in Table 6. 
TABLE 6.—PnonanBILITIES ок Various COMBINATIONS or 10 Prisms, EAcH 


Havine THREE SIDES MARKED WITH AN H AND Охе wirH a T 
Combinations 


aving Probability 
0 H TER 
ан га 576 
2 H DR 
зн Tê 
ан 100 576 
5 H T0358 
6 H 1045555 
gH 102 is 
8 H 4 
9 H one 
10 H nora 
It will be noted that 


cessive terms of the expansion of ( + in 
obtained is thus a “binomial” distr 
symmetrical. If ү 


obtained would have 


The distribution 
: ibution, but it is no longer 
Prisms had been used, the probabilities 
been the Successive terms of the expansion 
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of ( + 4)* and the equation for any one term would have been 


А N! DOA 
P(N) = ra xs) Ө 


or, if № = М — М, r 


N! 3 Ni T №: 
PN) = ED (9 


TABLE 7.—PnonaniLITIES OF VARIOUS COMBINATIONS OF 10 Prisms, EACH 


Having THREE SIDES MARKED WITH A T AND Oxe WITH AN H 
Combinations 
Having Probability 
он 59,049 
1,048,576 
196,830 
IA 1,048,576 
295,245 
2 H 1,048,576 
H 262,440 
1,048,576 
H 153,090 
1,048,576 
61,236 
H 1,048,576 
17,010 
6H 1,048,576 
3,240 
H 1,048,576 
405 
H 1,048,576 
30 
эн 7,048,576 
1 


юн 1,018,576 
И each prism is replaced by а group of objects! within which 
the probability of an H is p: and that of à T is р», if there are N 
such groups, and if combinations are composed so as to consist 
of one object from each group, then the probabilities of the 
various types of combinations among the set of all possible 
combinations will be given by the equation 


(1) 


or the binomial distribution. 
n for the symmetrical bino- 


ee 
P(N:) = Nina! PPP 


This is the most general equation f 
When p; = p: = 4, it is the equatio 


! The number of objects in each group is not relevant to the argument. 


44 GENERAL THEORY OF FREQUENCY CURVES 


mial distribution. When р, = ps, it represents an asymmetrical 
binomial distribution. p 

Character of the Asymmetrical Binomial Distribution. А graph 
of the probabilities of Table 6 is shown in Fig. 9. The asym- 


0.288 
0.264 
0.240 
0.216 
0.192 
20.168 
3 0.144 
© 0,120 
0.096 
0072 
0.048 
0024 


g 01234556 1 8 9 I0NoofHs. 
Fro. 9.—Asymmetrical binomial distribution, N = 10, fi = 1, р: =} (see 
Table 6). 

metrical character of the distribution is obvious and needs no 
comment. Table 7 and Fig. 10 show what the distribution of 
the number of H’s would have been if the probabilities of H’s and 
T’s had been reversed, t.e., if the probability of an H had been 
4 and the probability of a T 2. Tt will be noted that this altera- 
tion in the probabilities changes the skewness of the distribution 
from negative to positive. This is generally the case; if the 
probability of an H is greater than the probability of a T, the 
distribution of the number of H’s will be negatively skewed; 
and if the probability of an H is less than the probability of a T, 
the distribution will be positively skewed.1 


1 If the number of T’s were taken as the attrib 


ute, the reverse would be 
true. 
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Lathe i i 
ты же. analysis shows that in general the asymmetrical 
EV ribution! has the following characteristics (the 
e is the number of H’s, N1): 


re Wo 

o — integer betw Гр; — pe 

m NE bo ween Np: — pe and Np: + p: 

(ps — p)? (2) 


1 8 9 10 No.of Hs 


012 5 4 5 
N =10, fid ра 4 (ee 


distribution, 
Table 7). 


Fig 
+ 10,—А 
Asymmetrical binomial 


As indi 
So th о above, the sign of ^ / should be the sign of P2 — р» 
it will be positive when the distribution is positively 


Skey 
ved ; bar : 
iu and negative when it 18 negatively skewed. These equa- 
Р hes 
in и course, also apply to the symmetrical binomial distribution 
ase pi = pec 2. 
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tions are derived in the Appendix to this chapter.! It might be 
well for the nonmathematical student, however, to check them 
by calculating X, 6, 8. and 8 of the distribution of Table 7 and 
then comparing the results given by the application of Eqs. (2).* 

Asymmetrical Binomial Distribution and the Normal Curve. 
If N is increased and if the horizontal scale is reduced in propor- 
tion to VN, the tops of the ordinates of the asymmetrical 
binomial distribution will tend to trace out a smooth curve, just 
as did the symmetrical binomial distribution. In general, the 
character of this curve will depend on the size of N and on the 
difference between p; and ps. 

Equations (2) suggest that, И М is very large, any binomial 
distribution, asymmetrical or symmetrical, will be approximated 
fairly well by a normal frequency curve. For as М is increased, 
в approaches 0 and 82 approaches 3, which are the values of these 
coefficients for a normal curve. This implication is borne out 
by rigorous mathematical analysis. It is to be concluded then 
that, if N is very large, an asymmetrical binomial distribution 
is approximated by a normal frequency curve, just as was the 
symmetrical binomial distribution. 

The conclusion that an asymmetrical eurve can be approxi- 
mated by a symmetrical curve would seem at first glance to be 
paradoxical. It is to be noted, however, that this is true only 
when М is very large, and in that case the binomial distribution 
will not be very asymmetrical. For, as just indicated, the 
skewness of the asymmetrical binomial distribution dimin- 
ishes ы N increases. This follows directly from the fact that 
в: = oe It may also be noted from any graphic analysis 
showing how the shape of а particular binomial distribution, 2.е., 
one with a given p; and a given р», changes its shape as N is 
increased. Such a graphic comparison is presented in Fig. 11, 
for which the data are shown in Tables 8 and 9. 

, The size of N that must be attained before the asymmetrical 
binomial distribution becomes practically symmetrical and can 


1 See pp. 65-68. 


*The direct calculation of these quantities is the same as that carried 


out in Smith and Duncan, Elementary Statistics and Applications, pp. 284- 
285, for the symmetrical binomial distribution. 
? See Appendix to this chapter (pp. 68-74). 
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be "oxi 

CORSA md by the normal curve depends on the relative 

2 р: and р». For as the formula for 81 shows, if the difference 
een pı and р» is very great, then N must be so much the 


ARIOUS COMBINATIONS OF FOUR PRISMS, 


TAB 
LE 8.—PROBABILITIES OF V 
н AN Н AND Охе wiTH А T 


Ea a 
сн HAVING THREE SIDES MARKED WIT 


(N = 4) 
Combinations 
Having Probability 
Oh IE .003 
1H .047 
2 Н ‚21 
2-H .422 
4 H .316 


ARIOUS COMBINATIONS OF 16 Prisms, EACH 


Tan 
LE 9. Рвовлвилттез OF V. 
H anp Охе WITH д T 


H Es 
AVING THREE SIDES MARKED WITH AN 


(N = 16) 
Combinations 
Having Probability 
он ‚00000000028 
LH ‚000000012 
2 Н ‚00000025 
3 H ‚0000085 
4 H .000034 
5H .00025 
6 H .00135 
T H .00583 
8 H .01966 
9 H ‚05248 
10 H .11095 
11 H .18015 
12 H .22519 
13 H .20787 
14 H . 13363 
1 H .05345 
16 H .01002 


On the other hand, if pı and p» 
d not be very large to make the 
trical. 
tion and Pearson's Type III 
] binomial distribution 


large relative to p» — р 


E Bı is to be close to zero. 
Seas close, then М nee 
ra ution approximately symme 
jw ү Binomial Distributi 
idus - Although the asymmetrica 
Proaches the normal curve when N is 
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itis nevertheless of interest to find the type of curve that approxi- 
mates this distribution in those cases when it is still markedly 
skewed. One approach to this problem is suggested by the 
previous analysis of the symmetrical binomial. The normal 


N-4(p,-3, poh)» 


obabilities 


0.096 = 
E ме pst) 


0 | 2 3 4 SNoofH's 
0123456789101 1213141516 


Fra. 11.— Comparison of two asymmetrical binomial distributions (see Table 9). 


curve, it will be recalled,! was found to be the curve that had 
the same relative slope at various points as did the symmetrical 
binomial. This suggests that the curve that will give a good fit 
to the asymmetrical binomial will also be one for which the 
relative slope at various points is the same as the relative 
slope of the asymmetrical binomial distribution. Such a line 
of attack was adopted by Karl Pearson, and the curve that 
i» found to have this property is the one whose logarithmic 
orm is 


1See pp. 34-35. 
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logis y = logio yo + ka logio (: + 2) — kx logio € (3) 


where x repr 87 

rae represents a deviation from the mode of the curve, yo the 

a st the curve at the mode, y the height of the curve at any 
x, logio е = .434294, and a and К are constants depending 


0.168 


Е 


Probobilities 


5 0 E 1i 


orci 29 * 


19. 12.6 
——Comparison of asymmetrical 
type III curve (broken 


(solid line) binomial with Pearson's 


T. 
line) (see Table 10). 


al distribution from which the 


оп t $ 
he pi, po, and N of the binomi 
ome to be known as Pearson’s 


Curve į Ё 
*isderived.! This curve has c 
A 
d К 

lore specifically, z = (Ni + 3) — Npı Ps — b 
logio y, = (ka + 1) logioka — 10810 (ka)! — logo а — Ка logi €, 
а = 2pipa(W + 1) 
a a = ps р 


See Appendix to this chapter (pp. 


76-79) 
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type ПІ curve. The way in which it fits the asymmetrical 
binomial distribution of Table 10 is shown in Fig. 12. 


TABLE 10.—Comparison OF ASYMMETRICAL BINOMIAL DISTRIBUTION WITH 
Pearsox’s Tyre III Curve 


Asymmetrical binomial | Pearson's 

x distribution? | type IIT curve 
P(X) PX) 

0 .056,314 .061,245 
1 .187,712 .181,660 
2 - 281,568 . 272,720 
3 - 250 , 282 - 243 ,300 
4 - 145,998 . 144,220 
5 .058,399 .061,333 
6 .016,222 .019,826 
4 -003 ,090 -005,098 
8 -000,386 .001,079 
9 000,029 .000,192 

10 .000,001 .000,030 


Р 3 
! Derived from Table 7 (p. 43), by converting the fractions into relative numbers. 


HYPERGEOMETRICAL DISTRIBUTION 


In the previous section it has been seen that, when the prob- 
abilities of the two attributes are not equal, the result is an 
asymmetrical instead of a symmetrical binomial distribution. 
The bearing of this upon the generation of nonnormal frequency 
distributions in real life will be discussed below.? Before turning 
to these broader aspects of the analysis, however, it is desirable 
to consider the effects of other modifications of the conditions 
for normality. It is of interest in particular to consider what 
happens when the condition of independence (condition 5)* is 
removed. This will now be studied in some detail. 


+ The equation for this curve was first 
Proceedings of the Royal Society of London, 


discussed at some length in his fundamental paper on frequency curves; 
“Contributions to the Mathematical T 


zy heory of Evolution. II. Skew Varia- 
tion in Homogeneous Material.” See the Philosophical Transactions of the 
Royal Society of London, Series A, Vol. 186 (1895), pp. 356-360. Also sce W- 
P. Elderton, F'requency Curves and Correlation, Cambridge University Press 
London (1927), pp. 45, 90-94, 

? See pp. 60. 

3 бее p. 38. 


published by Karl Pearson in the 
Vol. 54 (1893), p.331. It waslater 


THE PEARSONIAN SYSTEM OF FREQUENCY CURVES 51 


Derivation of the Hypergeometrical Distribution. То show the 
effect of dependent probabilities on the form of a frequency dis- 
tribution consider the following card problem. Suppose there 
are 10 packs of 52 cards, each containing at the start 13 spades, 
13 hearts, 13 diamonds, and 13 clubs. Suppose further that all 
Possible combinations are made by selecting one card from each 
Pack in order, subject to the condition that if а card has already 
been selected from one pack the same card is not eligible for 
Selection from subsequent packs. For example, if the ten of 
spades is selected from the first pack, then the ten of spades is 
not eligible for selection from any other pack. Again, if the ten 
of spades is selected from the first pack, the five of hearts from 
the second pack, and the ace of clubs from the third pack, then 
heither the ten of spades, nor the five of hearts, nor the ace of 
Clubs is eligible for selection from packs 4 to 10. Given this 
restriction on the selection of cards, let the problem be to find 
the probabilities of combinations containing various numbers of 
Spades in the whole set of combinations of 10 cards that may be 
ormed in the way described. To facilitate the analysis let the 
cards that make up a combination be represented by letters 
from 4 to J. 

Consider first the combination 


Where Q stands for а card other than a spade. This is а com- 
bability of the first card, 


binati Е " 
ination containing по spades. The pro e ee 
4, in the combination being other than a spade is $$, for there 


first pack. The probability of the second card, B, being other 
AN a spade is 3f, for after а nonspade has been selected from 
Ше first pack there are only 51 cards left for selection from the 
“econd pack and only 38 of these are nonspades. In like manner 
the Probability of the third card, C, being other than a spade 
5 $5, ete. 


By the multiplication theorem for dependent probabilities 


se Probability of the combination containing no spades is 
lerefore 
31.30 
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Consider next the combination 


ABCD EF GH I J 
8000000000 


This is а combination containing only one spade. Из dara 
by the multiplication theorem for dependent probabilities is me 
to the probability of a spade in the first pack (that is, 13), mu a 
plied by the probability of a nonspade in the second pack after k 
spade has been selected from the first pack (that is, 22), eer 
the probability of a nonspade in the third pack after both a spade 
and a nonspade have been selected from the first and second packs 


(that is, 30), ete. In short, the probability of this particular 
combination is 


There are 9 other combinations, however, that contain only one 
spade, and in each case the probability can be shown to be the 
same as that just computed. Hence, by the addition theorem, 
the probability of any one of these 10 combinations, 2.6. the 
probability of a combination containing one spade, is 


18.329. 38.97. 0. 35.31. яз. 52. 
10:35- $2- 35-25 ммм. 


Consider next the following combination: 


ABCDEFGIHIJ 
85800000000 


This is a combination containing two spades. The probability 
of this particular combination is equal to the probability of 9 
spade in the first pack (that is, $2), times the probability of 2 
Spade in a deck from which a spade has been drawn (that is, $2); 
times the probability of a nonspade in a deck from which two 
spades have been drawn (that is, 28), times the probability of ® 
nonspade in a deck from which two spades and а nonspade have 


been drawn (that is, 1$), ete. In short, the probability of this 
particular combination is 


ЕИ ТИТИ 
There are C}? = 45 different combinations, however, containing 
only two spades, and the probability of each can be shown to be 
the same as that just computed. Hence the probability of any 
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one of these 4 ; А ‚ 
is se 45 combinations, i.e., the probability of two spades, 
45-4 -H EHH HH 
Ас 1 i : 
oso ke of this same line of reasoning shows that in 
e probability of №. spades out of 10 is 


(13)(12) - * > Q8— Mi 1)(89) (88) ° ^ ° 
РМ = су. (39 А 19 +N +1) 
(52) GD 0) 09) 48) 47) 06) (45) (44) 43) 


This vi 
is yields the distribution shown in Table 11. 


Tas 
LE 11.— 
ieee Mene асы or THE Vanrous POSSIBLE NUMBERS ОЕ SPADES 
Be Ew THE COMBINATIONS OF 10 Carps Еден Taar Мівнт 
ADE OF THE CARDS IN AN ORDINARY PLAYING Deck 


Combinations 


Containing Probability 


40,186 

0 spade 1,000,000 
174,140 

1 враде 1,000,000 
303,340 

2 spades 000,000, 
3 spades 278,070 
S. BP 1,000,000 
4 spades 147,460. 
7,000,000 

spades „46,340 
р 1,000,000 
e 8,922 ' 
jess 1,000,000 

991 
7 spades 1,000,000 


л 


8 spades 1,000,000 
2 


9 spades 1,000,000 
0.02 
10 spades 1,000,000 


If 
each of N packs contains S cards, раб of which are spades 


and 
d other suits, р: + P2 being equal to 1, and if all possible 
Card фы tions of N cards each are formed by selecting а different 
m each pack, the probability of N, spades will be given 


ve 
he general equation 
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ене NE X S-N + 1) (p25) ... 
| cadis ы (P:S = М+М, +1) (4) 
PUN, (808-1---(8-N-4D 
'This expression is а term in а hypergeometrical series,! and d 
distribution of Table 11 may be called a hypergeometrical dis 


tribution. A somewhat simpler equation that is equivalent to 
Eq. (3)* is as follows, 


(р18) Kp4S) (S — NN! 4) 
ud d (pS — рез "x. + М) I = м! ( 


In conclusion, it is to be noted that the percentages of 0, 1, 
2,..., Ni spades among combinations formed by selecting 
a different card from each of № packs are the same as the вне 
ages of 0, 1,2,..., М, spades among combinations of N car Я 
from a single pack. "That these two problems are the same an 
have the same solution may be shown as follows: In selecting the 
first card to form a combination, the situation is obviously the 
same for both problems, since pack 1 of the former problem and 
the original pack in the new problem are both complete packs. 
In selecting the second card, the situation continues to be the 
same for both problems. For, in the former problem, the cards 
eligible for selection from pack 2 are, according to the terms 
of the problem, all the cards except the card that matches the 


one drawn from pack 1; in the present problem the cards eligible 
for selection are by necessity only the cards left in the pack after 
the first card has been selected. Similarly, in selecting the third, 
fourth, fifth, or Nth card, the situation is the same for both 
1 As М, goes from 0 to N, this equation generates the series 


(p28) ° + - (mS — N - 1) [4 + — (№65) 
SS =)... (SNF (pS — N +1) 
NW- 1) (PiS) (pS — 1) T 3 
2! (pS — N + D(psS N F9) 


A general hypergeometrical series is of the form 


(a)(b) (a)(a + 1)(b)(b + 1) 
HIO tt Det s 


The part of the former expression in bracket, 
series in which z = l,a =N, b = —piS, and c = PS – N +1. 


* This is obtained from Eq. (3) by multiplying numerator and denomina- 
tor by (pS — NIDIHpsS — N + N) {S — NN — ND! 


5 forms а hypergeometrical 
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problems. Hence it follows that the two problems are identical 
in all respects and therefore have the same solution. 

In general, therefore, if all possible combinations of N cards 
are made from a deck of S cards, in which the percentage of spades 
15 p; and the percentage of nonspades is ps, where pı +p: = 1, 


0.312 
0.288 
0.264 
0.240 
0.216 


„0.192 
Е 
0.168 
В 
OMA 
0.120 
0096 
0.072 
0.048 
0.024 
0 017273475 6 1 8 9 NoofHs 


distribution, S — 92, N =10, pr =}, р: =} 
(see Table 13). 


f combinations containing №, spades 
This conclusion is of importance in 


Fig, js 
 13—А hypergeometrical 


t relat; 

xd relative frequencies 0 

eg those given by Eq. (4)- 
"tain types of sampling problems.’ 


haracter of the Hypergeometrical Distribution. It will be 


Noted from Fig. 13 that the distribution of Table 11 is skewed. 
1 general, the mean of the distribution is at Vp: and the various 


m 2 
9ments about the mean are? 
1 
‚ See pp. 209-211. 


y. бе Pearson, Karr, “On the Curves г 
Scribing the Frequency of Random Samples of a Populati 


то 
* 5 (1906-1907), рр. 172-175. 


Which Are Most Suitable for 
on," Biometrika, 
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N-1 
ve = Noo. (1 E ) 
N-—1 2(N — 1) 
из = Npipo(ps — pi) (1 Е = JE Өй 2 | 


N -1 6(N — 1) Nc 
w= Nome (11) [1 - 8—2 c= 


+ 3pm — 2 |1 - = (=1 )] 


—2 N-2t*8—3 


It is to be noted that the si 
N/S > à. 
These equations show that 


gn of из is negative when 2N > S or 


; if pi does not equal рз, the hyper- 
geometrical distribution is generally asymmetrical. If ps is 
greater than py, it will be skewed Positively; and if pi is greater 
than р», it will be skewed negatively. If p» = pı, the distribution 
will be symmetrical. The equations also show that the greater 
the difference between P» and р, and hence the smaller the product 
Pi» the larger! will be the coefficient of kurtosis B2 = ui/ui- 
ely large relative to N, i.e., if the number 
is made indefinitely large relative to the 
a combination, then p15, 
ПЕ le: ans DIS М 1 all become practically equivalent to 


P:S; similarly, PS, pS—1,... › PS — N +N, become 
practically equivalent to P2S, and 


S-N +1 become practically 


he hypergeometrical 
à ions for 6, and Qs of 
binomial distribution [Eqs. (2). Such а 
1 For §» reduces to the form 


Sr. 1 — бт — 6piper 
8 = 27 Pips 
t + Npipst 


where m, т, and Lare constants depending only on N and S, 
* This assumes that 5 


1 becomes so large that not only S but also ри 1$ 
very large relative to N. 
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relationship is logically to be expected; for if the size of each pack 
of cards is very large relative to the number of cards making up a 
pronus, the probability of a spade in the pack is not much 
Changed in the course of forming combinations. That is, the 
mer reduces to the one considered in the previous section. 
re in this manner the binomial distribution is a special case 
the hypergeometrical distribution. 
x Hypergeometrical Distribution and the Pearsonian System of 
ат Curves. The hypergeometrical distribution forms the 
n S of the Pearsonian system of frequency curves. Having 
3 ed that the normal curve had the same relative slope at various 
ie às the symmetrical binomial distribution and that Pear- 
a ris type III curve had the same relative slope at various points 
5 the symmetrical binomial distribution, Karl Pearson raised the 
ш What curve has the same relative slope at various points 
Жа hypergeometrical distribution? Such a curve, he con- 
tion ed, would be a generalized frequency curve; for in the deriva- 
а of the hypergeometrical distribution no assumption was 
tio € as to the equality of the probabilities nor was any assump- 
N made as to independence. 
еы. m it can be shown! that the relative slope of the hypergeo- 
ES rical distribution (ie. the relative slope of the frequency 
к that the distribution forms) is given for various mid- 
ints, X = М, +4, by the expression 
Relative slope — T EI^ (6) 


values of S, N, рь and p: 
Equation (0), therefore, 
tion capable of represent- 


ig 
T” а, bo, by, and bs depend on the 
Was E 2 pergeometrical distribution. 
ing aken by Pearson as a general equa 
> any frequency distribution. 
Sivas characteristics of a particular сигу 
dei ion will depend on the values of a, bo, Dı, and bs. The 
ge a is of significance in determining the position of the 
т е. For the slope of the curve is zero at its peak (assuming 
the Curve to be a smooth one), and the mode of the curve 1s 
refore given by X +a = 0. That is, the mode comes at 
~ —а. The other parameters determine the general shape 


1 
Se я 
ĉe Appendix to this chapter (p. 79). 


e represented by this 
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of the curve. Thus, if the denominator of Eq. (6) is factored 
into the product of two expressions, viz., 


> _ bi МИ = Abobe\ [y —b у — Abs 
x 9b; a De 


it is seen that the form of the curve will depend on the value of 
bi — 4bobe. This it will be noted, is the so-called “discriminant” 
of the equation bo + b:X + b»X? = 0, since its sign determines 
the nature of its roots. The discriminant may also be written 


Abobs (22, — i) Accordingly, Karl Pearson took 02/400» as 
092 


the criterion of curve form.! On the basis of this criterion, he 
distinguished three main types of curves. If the criterion is 
negative, i.e., if the roots of the equation by + b, X + bX? = 0 
are real and of opposite sign,* the curve belongs to the class of 
curves distinguished by Pearson as type I. If the criterion is 
positive but less than unity, i.e., if the roots of the equation 
bo + biX + 6-Х? = 0 are imaginary, the curve belongs to the 
class of curves distinguished by Pearson as type IV. Finally, if 
the criterion is positive but greater than unity, 7.е., if the roots of 
the equation bo + bıX + beX? = 0 are real and of the same sign, 


1 For the correct use of the Pearsonian criterion 
from the mean. 

? Equation (6) is more general than the hypergeometrical distribution 
from which it was derived, for the constants of a hypergeometrical distribu- 
tion will never give rise to a type I eurve (see Appendix to this chapter, 
p. 81. Having been suggested by the hypergeometrical distribution, 
Eq. (6) was taken by Pearson as а general frequency equation in which the 
constants could have any values whatsoever, whether or not they satisfied 
the requirements of a hypergeometrical distribution. Other considerations 
suggested the reasonableness of this. For example, if the causes of variation 
in X remained the same over the whole range they would generate a normal 
distribution, for which the relative slope would be —z/s?, И, however, 
the causes of variation varied with X, then ¢ would become a function of X, 


and the relative slope might be written — aay If the function o(X) is 


expanded in a power series (Taylor’s expansion) in the neighborhood of the 
mean of X and the first three terms of this expansion are taken as a good 
approximation to the function, the equation for the relative slope becomes 


; X should be measured 


x к>. Е 
7 Wu d E which is essentially the same as Eq.(6). (Cf. PEAnsoN, 
Kart, “Das Fehlergesetz und sein 


е Verallgemeinerungen durch Fechner und 
Pearson. A Rejoinder,” 


Biometrika, Vol. 4 (1905-1906), p. 204.) 
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the curve belongs to the class of curves distinguished by Pearson 
as type VI.* 


"510 15 20 25303540 45 5055606570 15 80859095 

Ages 

Vie. 14.—Type IV frequency curve. Number exposed to risk of sickness 
according to Sutton’s sickness tables (males, all durations). 


Frequencies 
Ed AE 
85288 


w 
оё 


31 42 41 52 57 6 67 72 T1 82 81 92 
Ages 

Fie. 15.— . Number exposed to risk of sickness. 
Type I frequency curve a ALU. E p. 19.) 


(According to Watso: 
These are the three main classes of Pearsonian frequency 


Curves, А type I curve is shown in Fig. 15. Curves of this 
type are limited inrange. ‘They are usually bellshaped, although 
skewed, but may under special circumstances be even U shaped, 

* See ELDERTON, op. cit., pp. 42-43. Fig 


by permission of the author and publisher 
62, 69, 77. 


IT 22 21 32 


ures 14, 15, and 16 are reproduced 
s of this book, 2d ed. (1927), pp. 
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J shaped, or twisted J shaped. Figure 14 shows a type IV curve. 
These curves are unlimited in range, bell shaped, and skewed. 
Finally, Fig. 16 shows a type VI curve. These are unlimited 
in range in one direction; they are usually bell shaped and 
skewed but may also be J shaped.! 

Various "transitional" types of curves have also been dis- 
tinguished depending on special values of theb's. Forexample, 


200 


Frequencies _ 
8 88 


A 
o 


0 


Frc. 16.—Type VI frequency curve. Number of entrants, limited-payment 
policies, 1863-1893; experience summed in groups of 10 years of age and divided 
by 100. 


if b» — 0 and the criterion is accordingly infinite, the result is 
Pearson's type III curve discussed above. If both b» and b; are 
Zero, the result is the normal frequency curve. A table of 
criterion values and curve types is given on page 135. 

General Significance of the Pearsonian System of Frequency 
Curves. Explanation of Nonnormal Distributions. The fore- 
going analysis is of considerable significance in explaining the 
conditions that give rise to various types of nonnormal frequency 
distributions. To illustrate’ this, consider once again the flour- 
bag experiment described in the p 
before that the experimenter has 
weighing exactly 5 pounds. 
added or subtracted from ea 


subtracts an ounce for each T that faces down. Under the 


wn in a random fashion, 
1 See ibid., chart opposite p. 46. 
* See pp. 47-50. 


* Cf. pp. 36-37. Also see Бмітн and Duncan, о „єй . 292-293. 
à : ‚ор ‚рр 
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intuition suggests that a frequency distribution of the weights 
of a large number of bags will approximate the form of an asym- 
metrical binomial distribution for which p; = 1 and pe = 4. 16 
would be like the relative frequency distribution presented in 
Table 12. 


TABLE 12.—FREQUENCY DISTRIBUTION OF A LARGE Nosen or Bacs or 
FLOUR or SPECIFIED WEIGHTS 
Weight of Relative 

Bag Frequency 
41. бол. SS 
41b. Sor. 1045576 
4 Ib. 10 oz. 1015576 
4 ]b. 12 oz. 1005576 
4 Ib. 14 oz. 1045576 
51b. бол. RST 
51Ъ. 207. 1028578 
51Ъ. 4oz. | ИВ 
51b. бол. 1948578 
5lb. 802. тоат 


515. 10 oz. 1,018,576 


E The general shape of this distribution can be represented by a 
Pearsonian type III curve. Owing in this particular case to the 
greater probability of subtracting an ounce than of adding an 
ounce, the mean weight of the bags will be less than the original 

central value” of 5 pounds, and the distribution will be skewed 
positively. If the prisms had had more H than T faces, the 
Probability of adding an ounce would have exceeded the prob- 
ability of subtracting an ounce, the mean weight would have been 
&reater than 5 pounds, and the distribution of weights would have 
€en skewed negatively. 
If, instead of adding or subtracting flour in accordance with 
he number of H's and T's appearing on the throws of 10 prisms, 
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the additions and subtractions had been made with reference 
to the number of spades and the number of other cards found 
among 10 cards drawn without replacement from a pack of 52 
playing cards, then the distribution of weights would have tended 
to conform to a hypergeometrical distribution and it could have 
been represented by one of the Pearsonian curves given by Eq. 
(6). In the particular case in hand, the relative frequencies 
with which bags of 4 pounds 6 ounces, bags of 4 pounds 8 ounces, 
etc., would have tended to occur would 
Table 11, and the general shape of the distribution would have 
been described by a Pearsonian curve of type IV.* 
probability of adding the initial ounce ( 


added whenever a black card appeared and subtracted whenever 
a red card appeared, then the distribution of weights would have 
been symmetrical about the normal value of 5 pounds, but 
the kurtosis of the distribution would have been less than 3, 
that is, it would have been less peaked than a normal frequency 
distribution.1 

Thus, whenever the deviati 
value is the algebraic sum of 


resulting distribution of the var 


* See Appendix to tl 
! Cf. p. 56. 
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Although this conclusion as to the causes of skewness and 
kurtosis in homogeneous variates would appear to be generally 
applicable, one point needs to be further considered. The 
asymmetrical binomial distribution and the hypergeometrical 
distribution, upon which this conclusion was based, are both 
discrete distributions. The number of H's and the number of 
Spades can vary only by integral units. 'The weights of the 
various bags of flour derived from either the binomial or the 
hypergeometrical distribution differed exactly by multiples of 
2 ounces; there were no bags of fractional weights. Although 
the foregoing conclusion would therefore appear to be a valid 
explanation of the distributions of discrete variates, such as the 
number of veins in a leaf, the number of petals on a flower, and 
the number of individuals in a litter, its validity as an explanation 
of the distributions of continuous variates, i.e., of frequency 
curves proper, has yet to be established. This will now be 
considered. 

Difficulty in Explaining № onnormal Curves. In the case of the 
symmetrical binomial distribution, it was assumed that if the 
number of contributory causes was made infinitely large and 
if each contribution was made infinitesimally small, the resulting 
fluctuations in the variate would become practically continuous. 
In that instance, it was found that the distribution of the con- 
tinuous variate was of the form of the normal frequency curve. 

If the same method of attaining continuity, however, is applied 
to the asymmetrical binomial and to the hypergeometrical dis- 
tributions, the analysis immediately runs into difficulties. For, 
as already pointed out,! if the number of prisms is made very 
large, the asymmetrical binomial distribution diminishes in skew- 
hess; and as N approaches infinity, the distribution approaches 
the symmetrical form. The same thing is true of the hyper- 
geometrical distribution. If the number of cards in a pack is 
made larger and larger, the proportion of spades always remaining 
the same, and if the number of cards making up a combination is 
also inereased, the hypergeometrical distribution likewise 
diminishes in skewness and its coefficient of kurtosis approaches 
3.* The consequence is that, if continuity of a variate is the 


* Bee р. 46. 


* This is shown by Eq. (5). For № = spond if both S and N are 
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result of the number of contributory causes being infinitely 
large while their individual effects are infinitesimally small, the 
distribution of the variate is of the form of the normal frequency 
curve and the foregoing analysis fails to explain nonnormal 
curves. 

Karl Pearson recognized this difficulty in his analysis and 
Sought to meet it as follows: That distributions of continuous 
biological and physical variates were nonnormal was a fact that 
had been demonstrated by empirical investigation. “It is clear," 
argued Pearson, “that if such frequency curves . . . are to be 
treated as chance distributions at all, it would be idle to compare 
them to the limit of a symmetrical binomial. We are really 
quite ignorant as to the nature of the contributory ‘causes’ in 
biological, physical, or economic frequency curves. The con- 
tinuity of such frequency curves may depend upon other features 
than the magnitude of n... It may possibly be that con- 
tinuity in biological or physical frequency curves may arise from 
a limited number of ‘contributory causes’ with a power of frac- 
tionizing the result.”! For example, in the flour-bag experiment, 
the amount of flour added or subtracted in each case could be 
taken as a handful instead of an exact ounce. In such a case, 
the distribution of the weights of the bags of flour would tend to 
form a smooth curve even when the number of prisms used was 
small.? 

Thus, in proceeding by argument from the discrete sym- 
metrical binomial to the continuous normal curve, the element 
of disereteness is blurred and finally obliterated by making the 
number of causes infinitely large and decreasing the contribution 
of each cause. "The same argument applied to the asymmetrical 


Sq e little changed; but the № that appears 
in the denominator when иЗ is divided by 
causes В; to get smaller and smaller as N is į 


increased, such terms as 


йз = 5, ased, all terms in the ratio except 3 reduce to 0. 
* “Contributions to the Mathematical Theory 
Variation in Homogeneous Material,” 
Royal Society of London, 
2 Cf. Pearson, Kart, 
durch Fechner und P 
1906), p. 206, 


of Evolution. II. Skew 


Philosophical Transactions of the 
Series A, Vol, 186 (1895), p. 358. 


“Das Fehlergesetz und 


seine Verallgemeinerungen 
Carson, A Rejoinder,” 


Biometrika, Vol. 4 (1905- 


THE PEARSONIAN SYSTEM OF FREQUENCY CURVES 65 


binomial or to the hypergeometrie series will also blur the dis- 
creteness, but it will, in addition, cause the nonnormality to 
disappear. In order to retain the nonnormality, it seems neces- 
sary to keep the assumption of a small number of contributing 
causes; the blurring of the discreteness is accomplished by the 
assumption that the contributions of the contributing causes 
vary among themselves. 

In the Pearsonian analysis of nonnormal frequency curves, 
then, it is necessary to suppose that nonnormal series in real life 
are created by the number of causes affecting variability being 
small and their contributions varying in magnitude. This sup- 
Position is not unreasonable when applied, for example, to such 
things as the distribution of income, wage rates, and other 
similar economic phenomena that are known to have nonnormal 
Static variability. In these and other economie and social 
phenomena the factors causing statie variability may reasonably 
be supposed to be not only finite but also comparatively few in 
number, 


APPENDIX 


MATHEMATICAL PROOFS 
A. Proof That, for Any Binomial Distribution Represented by 
N! id _ 
the Equation P(N:) = NUNG pipi, the Mean = Npi, 
в = V Nppo» 


B; а DP Lua held 1 — брір: (Since Eqs. (3) of 
Npip2 ’ NpiP2 
Chap. II are merely a special case of the foregoing for which 
Pi = р, = 4, the following is also a proof of them.] 
By definition the mean of any distribution of probability is 
equal to ZP(X)X [Chap. 1, Eq. (2). Hence the mean of the 
binomial distribution is 


N! Wem 
Y, poros = Y wave РА" 


Where the summation is for N, from 0 to N. . 
If М, is canceled out of the numerator and out of Ny! in the 
denominator, however, and if Np: is factored out of the resulting 
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B. Proof That the Mode of a Binomial Distribution Lies 
between Np; — psand Np; + pı. If the ordinate of the binomial 
distribution at N, is to be the modal ordinate (Z.e., the highest) 
it must satisfy the following criterion: 


Fra S Ум = Ёма 

or 
N Ipii7!pi-*H < Nipipi-* 
Wi = DN Ni FD! Mw — ND! 


ES TAA i E 
ммм - 1)! 
On taking out the common factor 
N !Ipi-!pY-5ii 
О, = DIN =, =D! 


these inequalities become 


P — MEL ЧЕ ОНИН. NND 
(N—-Nic-DQG-N)-^ NN = N) ^ (Ni + DOT) 
From the first inequality it follows that 
Мр = (N — М, + Пр: 


or that М. (р, + р.) < Мр, + pi, or, since р: + р» = 1, that 
М, < №, + р. From the second inequality it follows that 
р:(№ + 1) > pi(N — Му) or that Ni(pi + po) + р > Np, or 
that №, > Np; — ps. Hence, in summary, if М, is to be the 
mode it must be the integer lying between Np: — pz and Np + p: 

C. Proof That the Binomial Distribution Is Approximated by 
the Normal Curve. The general equation for the binomial dis- 
tribution is 


! 
P(N») = ww pip} () 


where №, = N — М, and Pı +P: = 1 (cf. page 43). The 
equation for the symmetrical binomial distribution (cf. page 33) 


is the special case for which р, = P2 = +. The following analysis 
shows that the general binom 
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If the variable N, is replaced by = = Ni — Мрь that is, by 
the deviation of N, from its mean, the general equation (1) 
becomes 

P) A рее” @) 

i (Wp: z)(Npe— 2)" 7 
This gives the probability of x cases more or less than the mean 
number. The mean! of this distribution of z is, of course, 0, 
but its standard deviation? is the same as that of Ni, viz., V Npipo 


Ha ай-жыл Жы {р = ; = 1 — 6pipe 
its Qi is likewise “Npp: ' and its фә = 3 + Npipe 


1 
For simplicity assume that Np: and Nps, and hence х, are 
integers. This assumption does not materially affect the results 
since the error involved is of order 1/N and the subsequent 
argument will assume that N is so large that terms of order 1/N 
or higher order may reasonably be neglected. In other words, 
this assumption is good enough for the degree of approximation 
given by the final equation. К 
Expression (2) may be written 
(Np) QN pz)! РЕБЕ 
Р(х) = Р(0) Wp. +: — 2)! pip: (3) 
Where ; 
N! 
Р(0) = pNP? 


or the value of P(x) when x = 0, its mean value. To Eq. (8) 


Pipi P: 


1 Since ж = N, — Мрь the mean of = is 
xP(r)r = zP(N)N: — NpiEP(N). 


But Р(М)М = Np, the mean of Ni (cf. p. 66), and =P(N1) =1 
Hence the mean of т is Np: — Np: = 0. 

? Since the mean of т is zero, the variance о 
standard deviation, equals 


Хл?Р(а) = X(N, — NpQ*PQNO) А 
= ENPPQN) — 2Np ZNiP(N) + N?pi ZP(N.). 


f x, that is, the square of its 


But xN,P(Ni) is the mean of N, and equals Лр. Also, zP(N) = 1. 
Hence SN?P(N:) — 20р. ENGP(N ) + Мр ZP(N S) = ХМР(№) — № рт» 
Whieh by the short equation (see р. 67) is the variance of №. Therefore 
the standard deviation of т is the same as that of №. 

А similar argument can be used to show that the third and fourth moments 
of т about its mean are the same as the third and fourth moments of Ni 
about its mean and hence that it has the same 61 and 62. 
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Stirling’s approximation for factorials may be applied. This is! 
а! = ae“ Dra 
which is correct to 1/a or as good an approximation as is being 

sought. Making use of this, Eq. (3) becomes 

P(z) (ponet VNp: (Nps) mesm: Му ptpz* 
Р(0) (р, + пурте М/2х(МУр\ + т) E 

(Npa — а) т (ру — 3] 


which reduces to? 
Р(х) _ ih 


PO) = а МАРЕ $ ге (4) 
( +) ( ~ Np; 


A final step is to take logarithms 
power series. Thus, 


and then expand these in a 


! More exactly Stirling's formula says th. 


at, as а approaches infinity, 
ү 


ae NV Tra < a! < а-а М та (\ + is) 
li x 

Accordingly, 1 + 4q S'ves an estimate of the de 

approximation. This formul 

the area under the curve 

which gives, by integration 


gree of accuracy of the 
а may be demonst: 
y = log z between o 


, 


Ama = ["logzdz- оока — 2 


Ап approximate estimate of this same area is obtained, however, by the 
trapezoid formula lef. R. Courant, Differential and Integral Calculus (1940), 
Vol. I, p. 343], erecting ordinates at T= l, vm 2, 5 3s- een 
This approximate estimate gives the 


Area = } log 1 + log 2 + log 3 + 
= log n! 


rated briefly by evaluating 
rdinates х = 1 and z = n, 


ее а а 


ems + log(n — 1) + $} log п 
— 4 log n, since log 1 = 0, 
Consequently, log n! 


T Flog п = nlgn— п +1, and hence 


log nl (a +3) togn — n 41, 


so that n! = nen = nen 2.71/n, which is very close to Stirling's 
formula, "т/п. For a more precise derivation of Stirling’s for- 
mula, see ibid., Vol, T, pp. 361-364. 


? Accomplished by dividing both 
(N pi) XPath (Np piii and canceling 


denominator., Note that Pip = 1. 
2 The value of log.(1 + а) is given approximately by 


log. (1 +a) =a -$ а at 


numerator and denominator by 
out like terms in numerator and 


3 
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P 1 


1 z 
= (eı = +5) ье) 
I = tf se WV tfm ] 
ы (v pitat) D Gr) +3 un) 
d EIC TA ETE ER 
P —2t3]| -Np 2WWp: 3 Wp: 
Upon multiplying out and arranging in ascending powers of 1/N 


this becomes 


Р(х) _ _ [xt + х(ре — Ру] р 2x3(p} — pi) + 3z*(p$ + pi) 
'"P() 2Npipe 12N pip} 
+ terms of order Nu and terms of higher order (5) 


Ш 


Now the equation for the standard deviation of z, viz., 
8 = м“ N] Pips, 


shows that variation in x is of order VN. In other words, 
as N increases, the variation in 2 increases as Vi N. Hence, 
terms such as 2/Npip2 are of order 4/N/N or 1/4/N and terms 


for —1 <a <1. 
` То demonstrate this, note that by the integral calculus 
dt 


log. (1 + a) -f TFt 


а is carried out to n terms and the integration carried 
1+t 

out term by term, it follows that 

1 و‎ ай 

og. (1+а) =a 5+3 т 


If the division 


T а" 5 a i^ 
+... b(-I tS 4 (-1 Jo Dy 


For a = 0, the integral term is less than a**!/(n + 1) which approaches Oas 
n approaches o. If —1 <a S 0, the absolute value of the integral term 


i nl я 7 Е 
is less than or equal to сат (Cf. В. Courant, op. cit., Vol. I, 


рр. 315-317.) Since the standard deviation of z equals ^/ Npips, it follows 


x : 
that z is of order 4/N. Hence NE and Np: ке of order 1/4/N, and if N 
and |z|/Np: will for some value of N 


is taken large enough, |z|/Np: 
E gh, lel/ t of the distribution lying between 


become less than [1| for the important par 
т 


d^ -3 and = = 3, say. 
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such as z?/N*?pip$ are of order N/N? or 1/N. Ifin Eq. (5) terms 
of order 1/N and those of higher order are neglected, then 


P(e) _ 2° x 


z 2 2 
loge Ро) C ENpp. ЭМрр, (0° P Е (pi — pi) 


or since p? — рї = (Pz — pi) (ps + pi) and p; + p-l 


› 


1 Р(х) E, -( LE x joa 
Ec P(O) 2Npp: \Мрр: ЗМ) 2 
On replacing Npips by 6°, ete., this becomes 


PQ) wt. fm iig» 
los: BO) = В ( ) 


or, on taking antilogarithms, 


zo (RLPY(z a 
P(x) = P(0)e 2%, ( 26 )G aa) (6) 
To complete this equation, P(0) must be evaluated. This 
may be done as follows: By definition, P(0) equals 
T N! NIIS 
| (Арі) KN p; 1 P "РЁ. 
Using Stirling's formula, 


P0) = Ae VIN pipop _ __ 
(Npi)NPie- 9i V/2xNp; (N po) Р:е-Хр: VN ps 
which reduces to i З 
ЙДЕ. >. cT 
all within an error of order 1/N or 1 /вг. 
Equation (6) may thus be rewritten 


р 1 22 _юи-р saa) 
ж) = —_ e 290 38 Ad 3d: 7 
( PUE (7) 
argin of error of order 
80 that terms of order 
d, Eq. (7) reduces to 


This gives the value of P(x) within a m 
1/8? or 1/N. If N is sufficiently large 
1/6 or 1/4/N can reasonably be neglecte 


1 жый 
Р = — е 26: 
e 84/25 ? 


(8) 


This is the equation for the normal distribution. 
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It will be noted that, when p: = ps, Eqs. (7) and (8) become 
identical. The normal curve is thus a better approximation to 
the symmetrical than to the asymmetrical binomial distribution, 
as common sense suggests. Equation (8) however, will also 
give results very close to those of (7) wherever the difference 
P» — pi is sufficiently small relative to ê. That is, if р: and pe 
differ little absolutely and hence the binomial distribution is only 
Slightly skewed even for small values of N, or if N is so large that 


BL Pe _ Bi Bas is small, then the normal distribution becomes 


s vN] pipe 
a satisfactory “first” approximation even to the asymmetrical 
binomial distribution. But if М is not very large and if pı differs 
radically from р», then the binomial distribution is rather skewed 
and the second approximation (7) had better be used. Of course, 
if N is very small, then it is preferable to use the binomial formula 
itself, for in this case neglect of terms of order 1/N may lead to 
Serious error. 

р Equation (8) shows that the ordinates of the binomial distribu- 
lion may be approximated by the ordinates of the normal curve. 
Tt will be noted that, as N increases, the ordinates of the normal 
Curve, as well as the ordinates of the binomial distribution (cf. 
Page 34), tend to get smaller and smaller and the curve becomes 
More spread out. This is because № enters into the formula for 
$ (that is, в = /Мр:р). The larger the value of N, therefore, 
the larger the standard deviation and, according to Eq. (8), the 
Smaller any particular ordinate (for 6 enters into the denominator 
9f this equation). If, however, the scales on which the curve 
ÎS graphed are varied in proportion to ê, that is, if the vertical 
Scale is lengthened and the horizontal scale is shortened in 
Proportion to в (that is, VM), then the normal curve retains a 
Constant shape, namely, that of the standard normal curve, 


50 


1 =. 
ES ё 
y м 2v 
Where г = x/é. This is the basis for the statement on page 34 
9f the text, that if the scales are adjusted in this way then the 
lmit of the binomial distribution аз N is increased is the standard 


normal curve. 
he above shows that a binomial ordinate can be estimated 


PY an ordinate of the normal curve and that a sum of binomial 
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ordinates can be estimated from the sum of certain ordinates of 
the normal curve. What is done in practice, however, is to 
estimate à sum of binomial ordinates over a given range from 
the area under the normal curve for that range, and it remains 
to be shown that an area under the normal curve is the approxi- 


mate equivalent of a sum of normal curve ordinates. This is 
demonstrated as follows: 
Ап ordinate of a normal curve 


—z? 
е?% and 


can always be represented by a rectangle of height 
Hr 
а base 1/8. Furthermore, if the a-scale is measured in standard 


deviation units, a succession of ordinates will be 1/8 units apart 


and the succession of rectangles representing them will all touch 
each other or, in other words, be contiguous. 


and 1/8 = 1/4/Npips is decreased 
will be thinner and their area will approach the area of the stand- 
ard normal curve for that range. Hence the area under the 
standard normal curve can be used as an estimate of the sum of 
a series of normal ordinates and thus of the corresponding series 
of binomial ordinates. 

D. Proof That the Relative Slope of the Symmetrical Binomial 
Polygon Is the Same at Any Abscissa Mid-point as That of a 
Normal Curve with the Same Mean as the Binomial Distribution 


As N is increased 
› rectangles over any range 


and a Variance Equal to N x I Times the Binomial Variance.! 


The ordinate of the symmetrical bi 
abscissa point М, is 


» N! iy" 
U^ = NA = М5 


and the ordinate at the next 


nomial distribution at any 


abscissa point N, + 1 is 


й N! pn 
т С ОМ М CD (5 


1 This and the next two proofs are 
the Philosophical T'ransactions of the 
186 (1895), pp. 355 f. 


based upon Karl Pearson's analysis in 
Royal Society of London, Series A, Vol. 
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The difference between these ordinates is 


Fins ie N! 1 5 1 
Un = ea — Im = HI М, = NFI NN: 


and since the abscissa interval is Ni + 1 — № = 1, the absolute 
slope of the side of the polygon joining the tops of these two 
ordinates is this difference Дух, (see Fig. 3, page 4). 

The ordinate of the polygon at the abscissa mid-point №, + + 
is the average of the two ordinates at the abscissa points Ni and 
№ +1. Hence the ordinate at this mid-point is 


1 N! 1 1 ) 
5 (уха + ys) = ОУ — № — 1)! (= +1 + М — № 
The relative slope at the abscissa mid-point Ni + $ is defined 
as the ratio of the absolute slope to the ordinate at that mid- 
Point. Hence the relative slope of the polygon at the abscissa 
mid-point N, + 4 is 
zh N= Ni- Му 
Vua, Балы Еул) FN № +, +1) 
N-25-1, 2X —М/2+#® 
EN +1) $Q + 1) 


Ау», — Yui — Ум 


If zis set equal to Ni +3 — т that is, to N, + $ minus the mean 


1, this expression for the 


N 
of Ni, and if k? is set equal to + 


relative slope at №, + $ becomes 


Дух: _ —2x 
Ум 2k* 
a 
The relati [ mal = —е 2°, at 
e relative slope of the normal curve, y PUE 
any abscissa point = X — X is 
т? — 
—— 2 
1 dy = d log y d( p» log с м т) 2 Qe 
y dx dx dx 20° 


Hence, — E T is the relative slope of а normal curve at the point 


[2 
1 а 
* whose standard deviation = k = (oe. Since the 
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standard deviation of the symmetrical binomial distribution is 
VN /4, the normal curve that has the same relative slope as the 
polygon of a symmetrical binomial distribution at any abscissa 
mid-point is the one that has the same mean as the - of the 


Я N Я 
binomial distribution and a variance equal to + times the 


+ 
à N 
variance of the binomial distribution. If М is large, the two 
4-1. i 

variances are practically the same, for then E 7 is practically 
equal to 1. j 

E. Proof That the Relative Slope of the General Binomial 
Polygon Is the Same at Any Abscissa Mid-point as That of a 
Pearsonian Type III Curve. The ordinate of the general bino- 
mial distribution at any abscissa point М, is 


im N! Міру 
Ум = NUN — Nj Pi ‘ps 
and the ordinate at the abscissa point №, + 1 is 


N! NrElaN—Ni—1 
(Wi + DIN — N, — DIP Pe 


The difference between these ordinates is 


Улмы = 


N Ip! pii pi p: ) 
М (М — N,- ту! m 41 Жел 
and since the distance between the 
М, is 1, the absolute slope of the sid 
tops of these two ordinates is Ayy.. 

The ordinate of the polygon at the abscissa mid 


is the average of the two ordinates 
М, + 1, that is, 


Дум = Jxua — Ум = 


abscissa points N, + 1 and 
е of the polygon joining the 


-point №, +4 
at the abscissa points N, and 


et 
ANIN — Ny — iji 
pi m 
(= ТММ: 
abscissa mid-point N, + $ is the ratio 
the ordinate at that mid-point. This 


Yes = (ууа + ys) = 


The relative slope at the 
of the absolute slope to 
has the value 
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Дух, _ Уха — Ум _ AN- М№,)р: — (Ni + Up: 
Ux — Sea F Ум) (N = Мур, + W: + Up: 
2(Npı — Ni — P?) 
Np. + (р: — ра) №, + pe 


which may be put in terms of the abscissa mid-point №, + 3 by 
adding and subtracting + to the numerator and adding and sub- 
tracting (pə — pi) in the denominator, as follows: 


| ANp- (№ + 3) — Pe +3] 
Np. + (р, — p1)(Nı + 2) + Pe — 20р: — р) 


If now z is set equal to (№ +4) = Ni + р: — 4, that is, 
to the deviation of Ni + 3 from what is practically the mode 
of the binomial distribution (see page 68), the foregoing expres- 
Sion for the relative slope can be put in the form 
Relativ > —2x 

elative slope — Nie рй + EST FH 

: + ps — #(рз — р) 


Which, upon making use of the fact that pi + ps = 1, reduces to 


" — 2r E 
Relative slope = g gp;  2рург + (р: — рут 


2 
چ کے‎ 
pe — Рі 
2рар(М + 1) mA 
р» — P: 
"e 1 
| aT 
Wheret 
з 2 _ 2рюр(У + 1) 
imi CD and Ae uc d 


t the point т is that of 


To find the curve whose relative slope а 
to integrate. This 


the binomial polygon, it is necessary merely 
1s done as follows: 

e as the boldface k and a used 
d here to simplify the printing 
sample values but repre- 


in an: e k and a used in this proof are the sam 
of m body of the text. Italic К and a are use vs 
% he mathematical derivation; they do not signUy 

nt population values just аз k and a do in the text. 
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Set 
ldy. —№ 
yd: ata 
But 
ldy _ dlogy 
ydr dz 
and, by division, 
kx ka 
gra tS += 
Непсе, 
d log у " ka 
m cot LI. 


and, upon integrating, 


log.y = —kz + ka log, (а + x) + log. с 
Therefore, 


- ka 
= се (а + x)" = c ( t 3 ee 


where c is the constant of integration and ¢ = сак, 

The value of c' is determined from the condition that the area 
under the curve must equal 1. In doing this, it is to be noted 
that negative frequencies are inadmissible so that x cannot be 
less than —a. It is therefore the area under that part of the curve 


from z = —a to © that is to be equal to 1, that is, 8 ydr =1. 


T a x : ека ака А 
о carry out this integration, multiply y by — бое > which, 
"LA 


of course, equals 1, which puts the integral in the form 


ы ы ka 
] P boi ] _„ rays (a + Jeter qs 


If z is set equal to k(a + x), this becomes 


3 cae’ = 
= ——__ kao—z 
r y dx (Ба) | z"te— dz 


(Note that dz = Бах). But Í, " aem dzis by definition T (ka + 1), 


called gamma of ka + 1, and equals (ka)!* Consequently, since 


E 
* Integration of /, 2"71e7* dz by parts gives 
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T y dx must equal 1, 
e Е, (kajt 
~ ае“ (ka)! 
Since у = с’ when x = 0 and since a = 0 is the mode of the dis- 
tribution (for dy/dx = 0 when z = 0) с' is designated as yo and 
equals the height of the curve at the mode. Therefore, the 
formula for the curve is that given in the text (page 49), viz., 


ka 
У = Yo (: + 2) в, the origin being the mode of the distribu- 


tion. This is Pearson's type III curve. Taking logarithms of 


both sides yields 
logi y = logio yo + ka logio (: + =) — kz logi е, 


which is the form given on page 49. . 
F. Proof That the Relative Slope of the Hypergeometrical 


Polygon Is Given at Any Abscissa Mid-point х н. №, + bya 
Е : о y Ву 

ormula of the Type Relative Slope = j E bX + bX By 
Eq. (4) of Chap. IV (see page 54) the ordinate of the hypergeo- 
metrical distribution at any abscissa, point N, is 


js (pi) (paS) KS — NW! 
Ux. = qug = у(рә8 N М) — Ny! 


=); + (т — 1) Í zn-2e7 dz. 


= =, and therefore 


But the first term is zero both for z = 0 and 2 
h gm-le-= dz = (m = 1) [ 2т—3е—° dz. 
0 


Emi кь ا‎ 
mis a positive integer, repetition gives 


h gn-le-: dz = (m — Dm-2):- [i e dz 


Or si Ы 
since x e dz = =] =1 
0 


i gn-le7 dz = (m — 1)! 

0 

When з is not a positive integer, Í ® qn-17 dzis taken as the definition of 
0 

-z dz is called the gamma function of m 


m — 1)! 'The function n 27—16 
oes written T(m). Thus P(m) = (m — 1)!, T(m + 1) = ml, etc. (see 
‚ 454), 
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and its ordinate at the abscissa point №, + 1 is 


(pS) (psS) 105 — МУ 
mu uoc р RU 


The difference between these two ordinates is 
Дух, = Yxi — Ум 
Ya Bee Ph nS sS) — ON! 
-= @S— №, — Dips —N+Ny)SMIW — М, = 1)! 


1 i 1 | 
les РММ РОМЕО PS NN- N) 


Since the distance between the abscissa points is 1, this difference 
is the value of the absolute slope of the side of the polygon joining 
the two ordinates. 

The ordinate at the abscissa mid-point N, +4 is equal to 
3 (ума + ys), Which equals 


(piS) !(р»8) 108 — N)IN! 
2(pS — №, + (pS — N + NISIN — М, — 1)! 
1 1 
(2:8 NEN РОТЕ OSO x 


and the relative slope is the ratio of the absolute slope to the 
value of this mid-ordinate. That is, 


2106 — Ni) — ND wow + DW. + 1] 
H — : : 
Relative slope — (pS— ND(N = М, + Ses — 
м, n 1)(Ni + 1) 


which reduces to 


: e 2[IN + NpıS — P:S — 1— N(S + 2)] = 

а CESET г м. 

+ 2] + 2N 

If now X is set equal to №, + 4, the foregoing expression for 
the relative slope may be put in the form 


x«|i- N + DC - aa) 


à » S+2 
Relative slope = 2NpS + 8 +1 A [г = pS = 2N]X 
4(8 + 2) 2(S + 2) 
x? . 
+5 


which is equivalent to 
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5 X +a 
Relative slope = 5G BS E mE 
Where 
21 (DO + pi) 
2 S42 
_ 2р5 + 8 +1 
0—2—4(5 + 2) 
"-— Ls 
С 2(S + 2) 
1 
b: = FES 


G. The Criterion for the Hypergeometrical Distribution. The 
character of the hypergeometrical distribution will depend on 
the value of the discriminant bj — 4bobs. From the results of 
the previous section, this is seen to be 

(p. — pS — 2N]? — 4QNpiS + S + 1) 
aS + 2)? 
The value of the discriminant is thus related to the values of 
Po ps №, and S. It is positive if N/S lies outside the limits 


1 dt анало аон = 
gt v. + 4) (» + » and it is negative if N/S lies within 


these limits. In the first case, the hypergeometrical distribution 
1S approximated by a type VI curve (see pages 58, 135); in the 
Second case, by a type IV curve. 

Tn the example employed in the text, p; = t, P? = 1, = 52, 
and N = 10. Hence, N/S = 4%. On the other hand, 


1 РНИ N A о. 
- 1 1\_1 К le Sha) (2 
5-2» (=± G3) 5) 


d 714\ /40\ 1 1 у 490 23 
i )5) "25mm =з 
Since N/S( = 49) lies within these limits, 
mstribution given by these values of pu P» 

ated by a type IV curve. 


It will be noted that N, pı, and S are s я 
at the second term of the discriminant (without the minus 


Sign) must be positive. That is, the roots of the equation 

" + bix + bor? = 0 cannot be real and of opposite sign. - Con- 
quently, Pearsonian curves of type I cannot be derived from a 
YPergeometrical distribution (see footnote to page 58). 


the hypergeometrical 
S, and N is approxi- 


all positive quantities so 


۹ 


responsible for its development, 


CHAPTER V 
THE GRAM-CHARLIER SYSTEM OF FREQUENCY CURVES 


The introduction of a possible variation in the contribution 
of a causal factor at the end rather than at the beginning of the 
argument may be considered a weak point of the Pearsonian 
analysis. ‘It is therefore of interest to consider another approach 
to the theory of frequency curves that assumes variable contri- 
butions from the start. Such an approach is that which gives 
rise to the Gram-Charlier system of frequency curves.! 

Derivation of the Gram-Charlier Formula. The assumptions 
on which the Gram-Charlier system of frequency curves is based 
are similar in many respects: to those from which the skew 
binomial and Pearson’s type III curve were derived. The first 
fundamental assumption is that the fluctuations in a given 
variable X are the algebraic sum of the contributions of a number 
of causal factors. "Thus the deviations of X from some central 
value are assumed to be equal to e +е-... + ey where the 
es may take on either positive or negative values. 
cance of this assumption is that the contributions of 


causes are additive rather than multiplieative or rel 
more complex way. 


the contribution of eac 
of other causes. 


The signifi- 
the various 
ated in some 
The second principal assumption is that 
h cause is independent of the contributions 


The two assumptions above are essentially 
made by Pearson in the derivation of his typ 
third fundamental assumption differs from 
case of the binomial distribution it Was assu 

1 So called because J. P. Gram and C. V. L. 


the same as those 
e III curve; but а 
Pearson's. In the 
med that each con- 
Charlier were principally 
Bee, for example, J. P. Gram, Om Raek- 


(Doctor’s Dissertation); and “ber die 
Entwickelung reeller Functionen in Reihen mittelst dir Methode der klein- 


sten Quadrate,” Journal fiir die reine und angewandte Mathematik, Vol. 94 
(1883), pp. 41-73. Also see C. У. L. Charlier, “Uber das Fehlergesetz,” 
Arkiv for Matematik, Astronomi och Fysik, Vol. 2, No. 8 (1905), pp. 1-9; and 
“Uber die Darstellung willkürlicher Functionen," Arkiv for Matematik, 
Astronomi och Fysik, Vol. 2, No. 20 (1905), pp, 1-35, 

82 


keudviklinger (Copenhagen, 1879) 
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tributory cause could add or subtract only a fixed amount to the 
variable (only an Н or a T was possible). Under the Gram- 
Charlier system, it is assumed that the contributions of each 
causal factor can take on various values, the probability of any 
given value being given by a distribution the form of which is 
fixed, but not necessarily known. These distributions of prob- 
Ability are allowed to vary from cause to cause; they may be 
discrete or continuous and are not subject to any restrictions 
other than that the probability of very large positive or negative 
Contributions shall be practically zero, i.e., that the distributions 
taper off to zero in both directions. | 

On the basis of these assumptions the Gram-Charlier analysis! 
Shows that, if the number of contributory causes is relatively 
large, then the distribution of the resultant variable X bears 
Certain approximate relationships to the distributions of the 
individual contributory elements e. In setting up these rela- 
tionships it makes use of certain quantities, called “cumulants”’ 
or "semi-invariants," that are directly related to the moments 
of a distribution. Thus, if ui wu» 8» and us are the first four 
Moments of a distribution about its mean, then kı = wi, k: = us 
k; = Us, and ky = ш — Зи, where the k’s are the first four 
Cumulants of the distribution. Both kı and w are 0, of course, 
When the moments are measured about the mean. It is obvious 
that k, = 9*, that k is related to the skewness of the distribution, 
and that k, is related to its kurtosis. When ks is 0, the distribu- 
tion is symmetrical ; when ky is 0, its kurtosis is 3. 

Y adopting certain mathematical approximations, the Gram- 
Charlier analysis shows that the cumulants of the distribution 
of X are the sum of the cumulants of the distributions of the 
individual contributory elements. That is, if ki» is the second 
cumulant of the distribution of the first contributory element 
€x Kee the second cumulant of the second contributory element 
©, Kes the third cumulant of the second contributory element, 
ete., and if Ka, Кз, and K, are the cumulants of the distribution 
of X, then 

Ke = ky + Keo + Ка + ccc + Куз 
K; = Каз + Юз + Kss + $ sor + kys 
К. = kı, + ka + Kas + mL + Kya 
' For a fuller discussion of the Gram-Charlier analysis see the Appendix, 
Pb. 92-99. 
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These relationships form the basis of the Gram-Charlier 
analysis. Suppose, it is argued, that the variance Ks of the 
resultant variable X is finite, as it is in most practical cases, and 
suppose that the variances, k;'s, of the individual contributory 
elements are all of about the same order of magnitude. Then, 
since there are N contributory elements, the variance ks of each 
of them is of about the size of К./М, or, as the mathematicians 
say, of the order of 1/N, and its standard deviation МЕ» is of 
the order of 1/4/N. This means that the average variation in 
each contributory element is of order 1/ VN. Hence the third 
and fourth moments (and therefore the third and fourth, cumu- 
lants), which involve the third and fourth powers of this varia- 
tion, will be of order 1/N? and 1/N °, respectively. Conse- 
quently, the third and fourth eumulants of X, K; and Ky, which 
are equal, respectively, to the sum of the third and fourth cumu- 


lants of the N contributory elements, will be of order 1/ VN 
and 1/N. 


arlier analysis shows further 
form of the distribution of X 


tions ar same order of magnitude, then the 
distribution of X will be approximately normal in form. 


If N is not large enough to make terms of order 1//N or 
1/N negligible but is’ still large enough to make terms of higher 
Order (such as terms of order 1 /N*) practically zero, then the 
Gram-Charlier anilysis sho 


: r vs that the form of the distribution 
of X will be given approximately by 
1 


= А (З= z^ В 2 g^] +S 
ССС уу гё (1) 


==, иек, дә = = m and 


4 — 3u, 
Д 
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This is the more general Gram-Charlier formula for a frequeney 
curve. The normal curve is the special case for which A and B 
both equal 0; all other curves are nonnormal. 

Components of a Gram-Charlier Curve. The effect of the 
additional A and B terms on the shape of the distribution of 
X is illustrated in Fig. 17. Here the ordinary normal curve is 
represented by curve I. If 4 is negative, the effect of the A 


"IJ. "Normal curve 


Ur--a29E-£S 


II.=0.1(3-6 X24 X* ) ----- 
2-07) " 
times normalcurve 


тез normal curve 


curve, 


Fic. 17.—Components of а G-am-Charlier frequency 


1 ordinate for positive values of 
XK 


term is to subtract from the norma 
dinate up to — © 1 


Р. == 


—— an increasing percentage of that or 


б 
and t x-i : 
hen a decreasing percentage up to 773 — = 4/3 and there- 
The opposite is true for nega- 


after to add a small percentage. 
x-z The fluctuations in the amounts added 


tive values of 
1 by curve II of Fig. 17, for which 
А/вз = — 2, Since, when A is negative, the effect of the A 
ш is to subtract from the right of the mean and to add to the 
eft of the mean (at least in its immediate neighborhood), the 
result is a transformation of the normal curve into a positively 
skewed curve (cf. Fig. 18). If A is positive, the effect of the A 
hae is to add to the right of the mean and to subtract from the 
eft (cf. Fig. 21), at least within the immediate neighborhood, and 
thus to transform the normal curve into а negatively skewed 


Curve (cf. Fig, 22). 


and subtracted are illustrated 
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'This dependence of the skewness of the distribution of X on 
the sign of А was to be expected. Гог, as indicated above, 
А = —K;/3! and K; itself is a sum of the cumulants kis, Кэз, 


0 


QIR 


Fic. 18.—Combination of curves I and II shown in Fig. 17. 


etc., which are the third moments of the distributions of the 
individual contributions. Consequently, if the distributions of 
the individual contributions are all positively skewed, their 


0 x 
с 


Fra. 19.—Combination of curves I and III shown in Fig. 17. 


third moments, and hence Kis, Kos, ete., will all be positive; Ks 
which equals Юз + ks; + kas +... К will be positive; 
A will be negative; and the distribution of X will be positively 
skewed. That is, if the distributions of the individual con- 
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tributions, e, are all positively skewed, then X itself will be 
positively skewed. Just the opposite is true if the individual 
contributions are all negatively skewed. If some of the indi- 
vidual contributions are positively skewed and others are 


Sla 


0 


Fra. 20.— Combination of curves I, II, and III shown in Fig. 17. 


I. Norma! curve 


СА 2 4 
I=- ON 3-6, + 257) 
I’: x 3 А g? оз 
*02132-- 25) / Himes normal curve 


times normal curve 


Fig, 21.—Effect upon normal curve of additions and subtractions type II' 
and ПГ. 

negatively skewed, then the value of Ks, and hence the skewness 
of X, will depend on whether the positive or negative influences 
are predominant. Since the contributions are presumed to be 
of approximately the same order of magnitude, the result will 
depend primarily on the relative number of positively skewed 
and negatively skewed contributions. 
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'The effect of the B term on the distribution of X is shown 
graphically in Figs. 17, 19, 21, and 23. If В is positive, the effect 
of the B term is to add a maximum percentage to the ordinate 


x = 


of the normal curve at = 0. This percentage decreases 


х_х 


x = B Д 
equals —.74 and 4-.74, respectively, from 


in size until 


which points an increasing percentage is subtracted from the 


EA 
T 


0 


Fic. 22.—Combination of curves I and Il’ shown in Fig. 21. 


mmi uu ums The maximum percentage subtraction is 
attained at == = — 4/3 and + 4/3, after which a decreas- 


ing percentage is subtracted until Ili = —2.33 and +2.33. 


From those points on, a small percentage is added to the normal 
ordinates. 

These fluctuations in the percentages added and subtracted 
are illustrated by curve III of Fig. 17, for which В/в4 is taken 
equal to +.10. When B is positive, the effect of the B term is 
to add to the normal eurve in the immediate vicinity of the mean, 
to subtract from it at intermediate distances from the mean, and 
then to add to it again on the tails of the distribution; the repult 
is to give the final eurve a greater than normal peakedness- 
That is, if B is positive, the distribution of X will have a coeffi- 
cient of kurtosis greater than 3 (cf. Fig. 19). Just the opposite 
is true if B is negative. In this case subtractions are made 
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ES the normal curve within the immediate vicinity of the mean, 
| d tions are made at intermediate distances, and subtractions 
are again made out on the tails (cf. Fig. 21). T he result is to 
o oduce less than normal peakedness (cf. Fig. 23). If B is 

egative, therefore, the distribution of X will have a coefficient 
of kurtosis less than 3. 


0 


BE 


Fic. 23.—Combination of curves I and III’ shown in Fig. 21. 


For the value of В 


Again this result was to be expected. 
depends on the 


depends on the value of Ka, and this in turn 
mus of kı4, Kes, etc., which represent the excess above 3 or the 
deficit below 3 of the coefficients of kurtosis of the distributions of 


he individual contributions. ‘Thus if the distributions of the 


ND 


and III’ shown in Fig. 21. 
re peaked than normal, 
which equals 


ъ 


Fra. 24.— Combination of curves I, I’, 


indie; 
j vidual contributions, e, ате all mo 
©, if the К?з are all positive, then К. 


ky + k4 + kos + ° ° ° + kya 


Will also be positive, B will be positive, and the distribution of X 
The opposite is true if the 


Wi 
ill be more peaked than normal. 
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distributions of the individual contributions are all less peaked 
than normal. If some of them are more peaked than normal 
and others are less peaked, then the peakedness of the distribu- 
tion of X will depend upon the predominance of excess or deficit 
influences; and since the individual contributions are all assumed 
to be of approximately the same order of magnitude, this predom- 
inance will be primarily determined by the relative number of 
excess and deficit influences. 

Usually the А and B terms are the only additional terms 
deemed worthy of consideration. For unless the number of 
causal factors, N, is relatively small, higher-order terms will be 
of negligible importance in determining the distribution of X, 
continuing to assume, of course, that the contributions of the 
various causal factors are approximately of the same order of 
magnitude." In the practical work of fitting а Gram-Charlier 
curve, it also becomes difficult to determine with any degree of 
accuracy the values of these higher-order terms. For these 
reasons, Eq. (1) is usually considered a sufficiently general equa- 
tion. In special cases of very skewed distributions, somewhat 
better results may be obtained by a type of mathematical approx- 
imation that.gives rise to a different equation? from Eq. (1). 
Gram-Charlier distributions of this latter kind are called “type 
B distributions" to distinguish them from those given by Eq. (1), 
which are called * type A distributions." 


GENERAL SIGNIFICANCE OF THE GRAM-CHARLIER ANALYSIS 

The principal significance of the Gram-Charlier analysis for 
the theory of frequency curves is the explanation it affords of 
nonnormal frequency curves when the number of contributory 
causes is limited. The Pearsonian analysis, it will be recalled,“ 


+ Failure to include higher-order terms, however, may sometimes produce 
negative frequencies in certain parts of a Gram-Charlier curve, which is of 


course a practical impossibility. Such, for instance, is seen to be the case in 
Figs. 18, 22, and 24. 


?'The difficulty is that the sampling fluctu. 
terms are especially great. 

3 For a discussion of this special variation in the analysis, see C. V. Г. 
Charlier, “Über die Darstellung willkürlicher Fimetionen ” Arkiv for 
Matematik, Astronomi och Fysik, Vol. 2, No. 20 (1905), pp. 1-35. An ele- 


mentary discussion in English is to be found in H. L. Rietz, Mathematical 
Statistics, Chap. VII. 


+ See p. 63. 


ations in these higher-order 
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was strictly valid for nonnormal distributions of discrete vari- 
ables, but its explanation of continuous nonnormal frequency 
curves depended upon а tardily introduced assumption of 
"lumpiness" in the contributions of the causal factors—a pro- 
cedure that was not entirely satisfactory since the analysis 
was originally based upon the assumption of definitely fixed 
contributions. 

The Gram-Charlier analysis assumes at the very beginning 
that the contributions of a causal factor themselves vary con- 
tinuously, the probabilities of various possible values being given 
by a distribution of a definite but unknown form. It then goes 
on to show that if a given variable X is an algebraic sum of the 
contributions e of a number of such causal factors, if the con- 
tributions are approximately of the same order of magnitude 
(e., if the standard deviations of the contributions are about 
equal), and if they are independent of each other, then the form 
of the distribution of X will depend partly on the forms of the 
distributions of the individual contributions and partly on the 
number of such contributory causes. Thus, if the number of 
causal factors is very large, the distribution of X will be approxi- 
mately normal, whatever the forms of the distributions of the 
individual contributions. If the number of causal factors is 
only moderately large, however, but their contributions are still 
of the same order of magnitude, the skewness and kurtosis of the 
distributions of the individual contributions will tend to produce 
& skewness and kurtosis in the distribution of X. In special 
Cases the skewness of the individual contributions may be com- 
Pensatory. Thus it is possible for the distribution of X to be 
symmetrical, not only when the distributions of the individual 
Contributions are themselves all symmetrical, but also when the 
Skewness of one or more contributions is offset by a contrary 
skewness of one or more other contributions. Similar remarks 
May be made with respect to the kurtosis of X. 

5 These conclusions are much the same as those drawn from the 
©arsonian analysis pertaining to the! asymmetrical binomial 
‘tribution. In fact, this part of the Pearsonian analysis may 
26 considered a very special case of the Gram-Charlier analysis. 
Suppose, for example, that, in the latter, distributions of the 
individual contributions are all assumed to be of a very special 
Sort. Suppose that in each case only one specified positive 
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contribution and one specified negative contribution are possible 
and that these have the same absolute value. Let the probability 
with which the contribution takes on its positive value differ from 
that with which it takes on its negative value. Е urthermore, let 
these distributions of the individual contributions be all alike, 
both as to the amounts of the positive and negative values and 
as to the probabilities of each. Then, under these very special 
assumptions, the Gram-Charlier analysis will yield the asym- 
metrical binomial distribution. The latter is thus a special 
case of а Gram-Charlier distribution. 

Finally, it is to be noted that the Gram-Charlier analysis does 
not, as does the Pearsonian analysis, relax the assumption of 
independence of the contributory causes. The latter, it will be 
recalled, laid aside this assumption and permitted the contribu- 
tion of a causal factor to be dependent on the contributions of 
other factors. The skewness and kurtosis yielded by the general 
Pearsonian formula were in part to be attributed to this assump- 
tion of dependence. The Gram-Charlier analysis assumes 
throughout that the contribution of any causal factor is inde- 
pendent of the contributions of other causal factors. It is 


restricted in this respect in the explanation that it affords of 
actual frequency distributions. 


APPENDIX 


DERIVATION OF THE GRAM-CHARLIER FORMULA FOR A TYPE A 
FREQUENCY DISTRIBUTION 


The following is a more detailed discussion of the Gram- 


Charlier system of frequency curves than was given in the main 
body of the text. Its principal purpose is to outline the argu- 
ment by which the formula for a type A frequency distribution 
[Eq. (1)] is derived. For the sake of those not well acquainted 
with higher mathematies, the essential steps are presented in à 
more or less nonmathematical manner, while the mathematical 
basis for each is sketched in a series of footnotes. For a fuller 
mathematical analysis, the reader is referred to the original works 
of Gram and Charlier (see footnote to page 82) or to the English 
accounts of them given by Arne Fisher’s Mathematical Theory 
of Probabilities ог Н. L, Rietz’s Mathematical Statistics. The 

1 Also see THIELE, Т. N., Theory of Observations, first published in 1903, 
(C. and E. Layton, London 


; à ) and recently reprinted in the Annals of Mathe- 
matical Statistics, Vol. 3 (1933), pp. 165-308. 


THE GRAM-CHARLIER SYSTEM OF FREQUENCY CURVES 93 


present discussion follows closely that outlined in E. T. Whittaker 
and G. Robinson, The Calculus of Observations," pages 168 to 174. 
4 The cornerstone of the Gram-Charlier analysis is the Fourier 
integral theorem. This, so far as the argument in question is 
concerned, states that if a certain function of a given variable, 
called its “moment generating function,” is known, then the 
frequeney distribution of the variable itself may be determined 
and conversely if the distribution is known its moment generating 
This moment generating function is 


function may be found.? 
of the form G = Y, e*f(x) dz, where т = X — XK, f(x) da is the 
distribution of т, e is the constant 2.718+, = V/ —1, and 9 is an 
arbitrary variable. G is called the moment generating function 
are performed on it with reference 


b ; > ; 
cause, if certain operations 
alue of 0, the results are the various 


to i0 and then 0 is given the v 
moments of the distribution of z.* 

The signifieance of the moment generating function for the 
Present, analysis is that if a variable X is the sum of a number of 
other variables, the values of any one of which are independent 
9f the values assumed by the others, then the moment generating 
funetion of the composite variable X is simply the product of 
the moment generating functions of the independent variables.* 

hus, with reference to the given problem, if variations in X 
are the sum of the contributions of certain causal factors, all 
zug independently of each other, the moment generating 
unction of the frequency distribution of X, the form of which 


5, 2d ed. 


1 ë 
Blackie & Son, Ltd., Glasgow, 192. 
function is defined as 


2 n 
More exaetly, if the moment-generating 


G(0) = | fade dz, 


Where ; = Sage E 
Where j = МЕТ and f(x) dz represents the frequency distribution of z, then 


Ја) = 1 


А 
G(a)e—*0 d0, and vice versa. 
s —- . : 
dies When x is distributed in the form of a smooth continuous curve, G is 

дей more exaetly by the formula of the previous footnote. 
"Thus, if G is differentiated with respect to ig and 0 is set equal to 0, 
lifferentiated twice and then 8 


the re а ы 
9 result is the first moment of f(z). If Gis € dt 
Set equal to 0, the result is the second moment of f(x), ete. This is fre- 


ui i ini nd | 
разу the simplest method of obtaining equations for the moments of a 
©wn distribution. 


"Тһе reason i agi = gb 
p. 1905 reason is that ete? = c^. 


Cf. WHITAKER and ROBINSON, ор. cit., 
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is being sought, is the product of the moment generating func- 
tions of the frequency (i.e., probability) distributions of the 
individual contributory factors. Symbolically, 


Gx = GiG: - - - Gy 


This relationship between the generating function of the sum 
and that of the individual contributions is of fundamental impor- 
tance, but it does not of itself advance the argument. For there 
is nothing in the given assumptions that provides any knowledge 
about the generating functions of the individual contributory 
factors. Hence the relationship of the latter to the generating 
function of X can of itself give no information about the nature 
of this resultant generating function. 

The next step in the argument, however, 
results. This is to note that the logarithm of a moment generat- 
ing function can be represented by an infinite series, the terms of 
which consist of rising powers of 10 and certain quantities that 
depend on the distribution to which the generating function 
relates. Thus if fi (e1) dei represents the probability distribution of 


yields more tangible 


cause 1 and G, = Y flees de; is its moment generating function, 
then the logarithm of С, can be put in the form 
Ж 02 ji)? 05 
log Gy = 0k11 gj Eie T e kis afk 11 kı4 fa уза (1) 


sae " . = 
where kıı, Ка», etc., are quantities that are called “semivariants 


or “cumulants” and, if e, is measured from its mean, are related 
to the moments of Хе) de, as follows: 


1The moment-generating function Gi = У еба (е) de, is а function of 


10 and may be written Gi (10). Let log G, (i0) 
may be expanded in a power seri 
borhood of i0 = 0, as follows: 


be called К, (10); then Ki(i0) 
ies in 10 (Taylor's expansion), in the neigh- 


Ki(0) = Ki(0) + K1(0)i0 + К” (0) = + 
where E xs etc., represent derivatives with respect to #0. The problem 
is to find К, (0), K{(0), KT) s ¢ 
Since G,(i0) = 1 when 6 
the probabilities of f(«) de 
Again 


= 0 [note that the sum from — = to + œ of the 
= 1, it follows that log @,(0) = 0, ie, K.(0) = 0 


dlgGiG) 1 aqg,p) 
ao) GG) dao) 


or 
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kıı = = 
и ш = 0, kiz = us Куз = us kis = ш — 305, etc. 


The same can be done for the logarithms of the generating func- 
tions of the other contributions, and since the logarithm of a 
product is the sum of the logarithms of the various factors, the 
gatio of the generating function of X can be represented as 

e sum of the N infinite series representing the logarithms of the 


А 
д К'(10) = бб) Gi (#0) 
n G,(i0)K'(i0) = G (#9) 
urthermore, by expanding e'f“ in a power series, it follows that 


6100) = У fla) da + i0 У, аа) da 


00° У, аа) da + GP Y алде) da + >. 


+ 
io)? 0)3 
1ے‎ + + us Ou 


© = 


bec 
cause by definition ш = У afle) da, is = p» f(a) de, ete. 


Thus 25 =e 
- 
Gi(i0) = wi + iu: + 9 وو‎ dett 
it follows that 
K'G0) = K1(0) + К!) + К) gm... 


and likew; FONS 
nd likewise from Taylor’s expansion for Ki(0), 


Hence the above identity may be written 
1 ; i0)? d ;9)2 
+ 20: + e vs deus ] [о + K//(0)i0 + K//^(0) e ES -] 


2 


= wi + ious + GP a + M vas 


ed out and if the coefficients of 
d to the coefficients of the same 
ed from its mean 


I 
SA М left side of this identity is multipli 
pow 18 powers of i0 on the left are equate: І 
ers of #0 on the right, it follows that when є is measur 
Kj0) =m = 0 
К! (0) = ws 
and К: (0) = из 


wo ge au 
or gy T g 3! 
KI) = в = 345 
t he. to К! (о), kis to KT (0), ete the 


Ifa... 
quati, set equal to К! (0), ki» to K1 (0), ki: to 
ion for log G,(i0) = K, (i0) is seen to be equation (1) of the text. 
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generating functions of the N individual contributions. "Thus, 
if the first subscript of a cumulant К represents the contribution 
to which it pertains and the second the order of the cumulant, 
it follows that:! 


log Gx = — (kı; + ko + Esc: Куз) 5 
(20)? 
+ (Kis + kz + Ез + ° + Ewa) 31 
4 
+ (и + Ки кн BEST O 


Comparison of the coefficients of Eq. (2) with those of (1) indi- 
cates that the cumulants of the generating function of the distri- 
bution of X are the sums of the corresponding eumulants of the 
generating functions of the individual contributions. That is, 
if the cumulants of X are indicated by Ks, Ks, etec., it follows that 


K: = kr + ka + ++ + EN 
K; = kis + ka + .. <° + ky, (3) 
Ky = ku + Ка +--+ + ky, ete. 


It is this last expression that permits certain inferences regard- 
ing the moment generating function of X and hence the distribu- 
tion of X. Thus suppose that two additional assumptions are 
made regarding the distributions of the contributions of the 
individual causal factors, viz.: (1) that the number of these 
causal factors, №, is very large and (2) that the variances 62 = us 
of the individual contributions are all approximately of the same 
order of magnitude. The k,s’s, it will be recalled, are equal to 
the variances of the individual contributions, and expression (3) 
shows that their sum gives the variance K, of X. The variances 
of physical, biological, and economic variables are generally of 
finite size so that for practical purposes it may be assumed that 
the variance of X is finite. Since it is equal to the sum of № 
variances, all of about the same order of magnitude, it can be 
concluded that each of the contributory variances is of the order 
of 1/N and that each contribution is of order 1/ VN. 

From this it follows that the higher cumulants and moments 
are of order greater than 1/N, for example, that Киз, kes, ete. 


1 For, it is to be remembered, kıı = 0 on the assumption that e; is meas- 


ured from its mean, and the same is true for Кэз, Каз, ete., on the assumption 


that all the е are measured from their means. 
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are of order 1/^/№ and that kis, Kas, ete., are of order 1/N?. 
Since K; is equal to the sum of the № cumulants kı3, Коз, ete., and 
К, is equal to the sum of the № cumulants kis, Ка, etc., the magni- 
tude of these higher cumulants of X is therefore of the order of 
1y VN and 1/N, respectively. Consequently, if the variance of 
X is finite and И М is very large, as is assumed, the size of Кз, Ка 
and that of the higher cumulants of X will be very small. А first 
approximation to the generating function of the distribution of 
X is thus given by log Gx = -к.9 orG = exp E | and 


from this, by means of the Fourier integral theorem mentioned 
mation to the distri- 


above, it may be shown! that a first approxi 


bution of X itself is f(r) = — — exp [ — sx, | which is the 
formula for the normal curve. The Gram-Charlier analysis 
thus shows that if the number of the contributory causes is very 
large and if their contributions are of about the same order of 
Magnitude, 2.с., if the variances of the contributions of the’ indi- 
Vidual causal factors are roughly the same, the distribution of the 
resultant variable X will approximate the normal form. 


là: m 
Since, aecording to the Fourier integral theorem, the frequency distribu- 


tion of X is f(a) = Û i g(0)e-i?z do when its moment generating func- 
: Vint =? кї 
tion is G(0) = i ы /(ж)е*#* da, it follows that, when G(0) = exp [ y ] 
then сак 


Ја) = x Pn exp E = ioz | dû 


But i ; , 
‘Ut if the square of the exponential is completed, it makes 


о Ге J NE (Fr 
(Qr TES 


В ; m s ца y м2" 
ut the integral is of the form {| ү E е dz, which equals к; 
Hence, 
т? 
exp [ - БЇ 


1 
f(x) = VASE 
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If the variances of the individual contributions are about the 
same (that is, if Kis, kos, etc., are approximately equal) but 
the number of causal factors, N, is not large enough to warrant the 
dropping of terms of the order of 1/4/N and 1/N, although 
sufficiently large to warrant the dropping of terms of higher order 
(i.e., terms of order 1/4//N?, 1/N2, etc.), then the moment generat- 
ing function of the distribution of X will be given approximately! 

: 9? 10)? 0+ š ; 
by log Gx к, 2i + E; o TK, i From this generating 
function, the distribution of X is found to be? 


1 For Ks, Кє, and the higher cumulants of the generating funetion will 
all be of higher order than 1/N ; and, according to the assumption, terms of 
that order may be considered as negligible. 

e? (i0)? 


4 
2 If log Gx = -К + eS uA and 


—K.0? s(10)3 
ЧЕ 


then the distribution of т is given by 


1 [= ZK: Куз Кю, 
a) = a f o| T + gr ET — is] ao 


This may also be written, to within terms of order 1/N, 


f(z) = x 1 [! + Eo T zd exp [ TEES. ias | do 


2! 


But as noted in the footnote to page 97 


1 = —K.0? =| 
2; Ja P [ i - их | 40 = E exp [ al 
A п №. 
and successive differentiation with respeet to х gives 


1 * (à [ET 0. аз 1 22 
ps Г „ (° exp [ 2r ioz | do = т Veg? [- ] 


1 = op | ~K20? . а 
a J-e” e | 2i — ioe | ao = 


1 И as 
Tri mx es [- ax. |) 


Hence f(z) may be written in operational form, 


f(z) = [1 m = (4) +o (3)] JE exp | ЕЯ 


or, actually working out the differentiation, 
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f(z) = 1 2e _ 
d Е +4 =й 
В 6x? 2? 

евра o 

Des z = X — X, 8 is the standard deviation of хт = V Es, 
= —K;/3!, and B = K,/4! 

f) = 8 [= - 845) Jew [- as 
ETE la 
where g? = K» A = —K;/3!, and B = К./4!. 


CHAPTER VI 


SUMMARY OF THE THEORY OF FREQUENCY CURVES, 
AND SOME EXAMPLES 


At the end of Chap. III a summary of the conditions leading to 
а normal curve was given. These will now be reviewed, and the 
conditions leading to nonnormality will be summarized, 

Summary of Conditions Leading to Normality. The Pear- 
sonian and Gram-Charlier analyses suggest that a variable will 
tend to be normally distributed if (1) its fluctuations are the sum 
of the fluctuations in a number of contributory causes; (2) the 
fluctuations in the contributory causes are all of about the same 
order of magnitude; (3) the contributory causes act independently 
of each other; and (4) the number of contributory causes is large, 
while the contribution of each is relatively small. These condi- 
tions, it will be presumed, are sufficient to produce a normally 
distributed variable. It is not to be presumed, however, that 
they are all necessary. Tt is possible, for example, that the 
presence of (4) may under some conditions make (3) unnecessary. 

Summary of Conditions Giving Rise to Nonnormality. The 
conditions that are likely to give rise to nonnormal frequency 
curves are in general the negation of the conditions that give 
rise to normal frequency curves. Thus, distribution of a variable 
X may fail to be normal for any one of the following reasons: 

1. If the deviations of X from some central value are simply 
an algebraie sum of the contributions of а number of causal 
factors, if these contributions are of about the same order of 
magnitude, and if they are independent of each other, the dis- 
tribution of X may not be normal (even approximately) if the 
distributions of the individual contributions are themselves 
nonnormal and if the number of causal factors 15 not exceptionally 
large. 

2. If the deviations of X from some central value are an 
algebraie sum of the contributions of a number of causal faetors 
and if these act independently of each other, the distribution 
of X will not be normal if the contributions of a few of the causal 

100 
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faetors are of outstanding importance as compared with the 
others and if the distributions of these contributions are them- 
selves not normal. If the magnitude of one contribution, for 
example, overshadowed that of all the others, then the distribu- 
tion of X would tend to conform to the distribution of that partic- 
ular contribution. 

In this connection it is to be noted that, if one or more con- 
tributing factors are of outstanding importance compared with 
Other causes of variation, the conditions producing variation 
in X might possibly be considered as nonhomogeneous. This is 
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Percentage of men or women 
Ww 
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" Height in inches 

Fio, 25.—Combination of two normal distributions to 
tribution (see Table 13). 


form 2 nonnormal dis- 


especially likely to be true if the factors in question are qualitative 
In nature. 

Consider, for example, the heights of adult human beings. ОЁ 
all the single factors affecting mature height, sex and race are 
among the most important. ‘Their effects are 50 outstanding that 
unless they are removed, by statistical classification, they over- 
Shadow the effects of most other causes of variation among 
Mature persons. Furthermore, both sex and race are qualitative 
factors that make a contribution to height of one amount when 
One sex or race is present and a contribution of a distinctly differ- 
ent amount when another sex or race is present. The net effect 
E to produce а variable, viz., adult height, that is not normally 
distributed. 
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The effects of sex and race being of this sort, it is the usual 
practice to view height in general as a heterogeneous variable 
whose frequency distribution has little significance. For when 
adult human beings are segregated according to sex and race, 
their heights form a distinetly normal distribution. This serves 
to illustrate how a distribution may be nonnormal because the 
variable in question is nonhomogeneous. 


TABLE 13.—Ах ILLUSTRATION or How THE COMBINATION or Two NORMAL 
DISTRIBUTIONS PRODUCED A NONNORMAL DISTRIBUTION 


(1) (2) (3) 
Е distribution (0,5 
frequency! frequency? 
56- .27 297 
57- .91 .91 
58- 2.59 2.59 
59- | |  .... 6.43 6.43 
B= | o M 13.69 13.69 
B Gd 7 xo 23.87 23.87 
62- #71 85.19 35.46 
63- .91 44.91 45.82 
64— 2.59 48.45 51.04 
65- 6.43 44.36 50.79 
66- 13.69 34.62 48.31 
67- 23.87 23.02 46.89 
68- 35.19 12.83 48.02 
69- 44.91 6.05 50.96 
70- 48.45 2.47 50.92 
71- 44.36 .86 45.22 
72- 34.62 .24 34.86 
73- 28402 р. 23.02 
74- 12.83 | | 2. 12.83 
75- 6050 | ... 6.05 
76- 21 | 0 2.47 
77- S6 | — Xu. .86 
78- Et м ا‎ .24 


9f the normal curve fitted to the heights of the 300 Princeton 
at the mean height is 6 in. shorter. 

Figure 25 and Table 13 offer an 
of a nonnormal distribution 
normal distributions with diff 


example of the production 
through the combination of two 
erent mean values. This is what 
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the distribution of the heights of white adults would be like. 16 
is of little significance because the important effect of the sex 
factor has not been separated from the minor effects of the many 
other factors influencing height. 

3. If the deviations of X from some central value are an 
algebraic sum of the contributions of а number of causal factors 
and if these are all of about the same order of magnitude, the 
distribution of X will not be normal if the amount of the con- 
tribution of a causal factor is dependent on the contributions of 
Other causal factors. It is possible, however, that even under 
these conditions the distribution of X will be approximately 


normal if the number of contributory causes is very large. 
4. If the deviations in X from some normal value are not à 
а number of causal 


Simple algebraic sum of the contributions of 
factors but are related to them in à more complex manner, it is 
likely that the distribution of X will not be normal. Thus 
Suppose that the deviations in X are equal to the cube of the sum 
9f the individual contributions instead of the sum itself. Then, 
although the sum may be normally distributed, X will be dis- 


tributed in a skewed manner.” 
Da ais that where certai 

individuals" are normally 
Measurements of the same indivi 


tri ө 
M ibuted. For example, the heights of 
ace make up a set of homogeneous linear measurements, while 


the weights of these individuals compose 2 set of nonlinear 
measurements in the sense that the weight of an individual is 
Ighly correlated with his *volume," а quantity that is likely 
ka vary with the cube of height rather than the height itself. 
n actuality, the heights of adult white males are found to be 
normally distributed, while their weights are definitely skewed, 
Sa giving an empirical illustration of the relationship between 

е distribution of a sum and the distribution of its cube. 

5. In some cases data that are selected from a larger normal 
8roup of data are themselves nonnormally distributed because 


n “linear” measurements of à 
distributed, other “nonlinear” 
duals will not þe normally dis- 
adult males of the white 


Te p. 63. Also see MARKOY, А. Wahrscheinlichkeitsrechnung (B. G. 

i px Leipzig, 1912), Appendices II and ш. В 

Deeks Rinrz, Н. i Mathematical Statistics, рр. ка 1 аш 

uted E Obtained by Certain Transformations of Normally Distrib- 
ariates,” Annals of Mathematics, Vol. 23 (1922), PP- 292-300. 
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of the special way in which they are selected. Suppose, for 
example, that the United States Army refuses to take adult 
males whose heights are less than 64 or greater than 74 inches. 
The distribution of the heights of Army men would then be a 
“truncated” normal distribution, such as that pictured in Fig. 
26. The extension of the curve by a dotted line recognizes the 
possibility that a few exceptionally qualified men would be 
accepted in spite of the rule. Again, suppose that the students 
who select courses in higher mathematics are only the better 


8 8 


Percentage of army men 
Ww 
e 


20 

10 | 

0 zi — Ae 
0 0 65 70 15 


Height in inches 


Fra. 26.—Nonnormal distribution of heights resulting. from special selection. 


Students, say the upper two thirds. Although the intelligence 
quotients of all students would be normally distributed, those of 
mathematies students would be more or less like the upper part 
of а normal curve. This is ilustrated in Fig. 27, the rounded 
tail below 80 indicating that a few below that score would be 
allowed to enter the mathematies course despite the general rule. 
On the other hand, the distribution of I.Q.'s of students taking 
courses in a notoriously easy field of 
like the lower part of 
Fig. 28. In all these cases 
of the previous chapter 
character of the data. 


of the subgroup results 
only from special selection from 


a larger normal group.! 
! Note that the nonnormalit 


y of the Subgroup arises from its selection, 
not at random, but with defini 


te reference to the attribute of its members. 
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T : T Р " 
E. hese are the principal reasons, it would appear, for the occur- 
ce of nonnormal frequency distributions. 
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Fig 
+ 27.—Nonnormal distribution of grades resulting from special selection— 
positively skewed. 
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grades resu 
ly skewed. 


Fia, 98 
+ 28.—Nonnormal distribution of lting from special selection— 
negativel, 


EXAMPLES OF NONNORMAL FREQUENCY DISTRIBUTIONS 


Figure 29 shows the distribu- 
eshmen of the class of 1943. 
= .36795 and В» = 4.6057 


Examples from Everyday Life. 
о of weights of 300 Princeton fr 
— general shape and the value g: а 


А ү 
10 А H i 
nnormal subgroup may be obtained from a larger normal group in this 


Way 
уе : 
ven though the latter is homogeneous. 
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(corrected for grouping) indicate a definite departure from nor- 
mality. Since height is a linear measurement and since weight 
is related to volume, which is a cubical measurement, it is possi- 
ble, as suggested in the previous section, that the departure of 
weights from normality is due to the variation in weight being 
equal to the cube of a sum of elementary variations rather than 
to the sum itself. 


80 


a o = 
© © > 


Number of freshmen 
A 
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O 10010120130 140 150 160 170 180 190200 210 220230 240 
Weight in pounds 


Fic. 29.—Histogram and fitted Gram-Charlier curve, distribution of weightsof 
300 Princeton freshmen. 


The distribution of family incomes is а distinetly nonnormal 
distribution. This is illustrated by the distribution of family 
incomes in the United States in 1935-1936, as shown in Fig. 30- 
There are various causes for this departure from normality. 
Possibly one important cause is that, the more money a family 
has, the easier it is to get still more money. ‘That is, it is likely 
that a principal cause of the nonnormality of the distribution of 
incomes is the lack of independence of the factors contributing to 
variation. 
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Examples from Sampling Analysis. Some of the better known 
forms of nonnormal frequency distributions are produced by the 
Process of random sampling. The theory of sampling that 
provides the mathematical models for various types of sampling 
Problems is discussed at some length in Parts II and III of this 
Volume. It will be sufficient here to call attention to certain 
Phases of sampling theory that are closely related to the discus- 


о 
© tS) 
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Percentage of families 


ý 0 1000 2000 3000 4000 5000 


Income in dollars 
in the United States, 1935-1936. 


Fig, 30 ý A 
; 30.—Distribution of family incomes 2 5 
(National Resources Committee, Consumer Income in the United States, p. 3) 


Sion of the previous chapters and serve to illustrate the generation 
of nonnormal frequency curves. It is to be noted that some of the 
Sampling distributions here discussed are nonnormal only when 

e samples are relatively small and tend toward normality 
28 the size of the sample is increased. This tendency will be 
noted in the discussion so that the conditions producing nor- 
mality and nonnormality will be clearly contrasted. 

Sampling Distribution of the Mean for Any Population. The 
theoretical discussion of the previous sections offers considerable 
Information about the distribution of the means of samples 
drawn at random from a large (theoretically infinite) population. 
"or it will be noted that the mean is merely 1/N times the sum 
of the individual cases. The variations in the individual cases 

Tom sample to sample are thus the ‘contributory causes" (the 
©з of the Gram-Charlier analysis) of the sampling variation in 
he mean, Since the individual cases all come from the same 
Population,! their individual distributions are all alike [that is, 


! The population is assumed to be so large that the suecessive withdrawals 
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Ле) = flee) = fles), ete.]. Hence the cumulants of the distribu- 
tion of the mean will be 1/N*-' times the cumulants of the 
population from which the samples are taken.! 

Since the second cumulant equals the variance, it follows 
that the variance of the distribution of the mean will equal 1/N 
times the variance of the population. Likewise, the third eumu- 
lant of the distribution of the mean (which equals the third 
moment of the mean) equals 1/N? times the third cumulant, 
(i.e., moment) of the population, апа the fourth cumulant of the 
distribution of the mean equals 1/N? times the fourth cumulant 
of the population. Therefore, the distribution of sample means 
will have the same general form as the population from which 
the samples were drawn. Its variance, however, will be less, its 
skewness much less, and its kurtosis very much less than that 
of the population. In fact, if the sample is large, the skewness 
and kurtosis will be practically nonexistent and the distribution 
of sample means will become practically normal. 

Sampling Distribution of Any Linear Function. What is true 
of the mean is also true of any statistic that is a linear function of 
the individual variables. For example, the regression coefficient 
bis = Хла»/ Ia. If samples are drawn from a bivariate popu- 
lation in such a manner that the т. values are always the same 


and only the 2; values change, then bis becomes merely a linear 
function of the sample 218, viz., 


bis = Ayal + Asl... + А.а] 


where the A's are dependent on the given values of the xxs and 
the numbers in the brackets differentiate the various sample 
values of 21. The eumulants of the sampling distribution of bis 


are accordingly related to the cumulants of the individual 21’s 
аз follows: 


Kı = Ajkis + А +... dibus 
К» = Atkis + А з E... Aikns 
Ky = Аа + Аа + -+ ААКы 


of the members of a sample do not materially affect the distribution of 
probabilities in the population. 


* The cumulants of the sum of the cases would be N 
of the population, and therefore the cumul 
1/Nth of the sum) would be 1/N*-! times t 

Note that the kth cumulant of f(X/, 
fO). 


times the cumulants 
ants of the mean (which equals 
he eumulants of the population. 
№) is 1/N* times the kth cumulant of 
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= which Кз means the second cumulant of the first case in the 
pe of x's, that is, тү; kıs means the third cumulant of aj; 
ч means the fourth eumulant of the nth case in the sample of 
y's, that is, al; ete. 
Xue E of the sampling distribution of bis will conse- 
Fm з be directly related to the variance of the individual 
um ^ smerny their weighted sum. Similarly, the skewness 
bres riosis of the sampling distribution of biz will bea weighted 
age of the skewness and kurtosis of the distributions of the 


individual x ee es 
ividual хз. If the distributions of the хгз for each тә are 


all alil. E á = H 
all alike in form (but not necessarily having the same mean, 


а 5 ete.), the sampling distribution of bi will have exactly 
B aem sort (but not the same degree) of skewness and kurtosis 
e individual 218. y 
more be noted, however, 
ke ationships between the cumul 

vidual жуз ($.е., the Аз, As; «жа 


that the “weights” that enter into 
ants of bi: and those of the 


‚ An) are equal to 


all E af! Ya 2 
LL 2 ME Nó (for 2x} = №) 
оў No} Nog 


d H 

ы : the variances of the individual a's were identical or were 
fis Ta same order of magnitude, the variance of biz would 
indivi times the order of magnitude of the variances of the 
vidual ауз and the third and fourth cumulants would be 
the = № times the order of magnitude of the variances of 
ual түз. Consequently, as in the case of the mean, 
is sampling distribution of the regression coefficient bie will 
din much less skewness and much less kurtosis than the original 
ооа of the individual 218. In fact, if the size of the 
nine large, the distribution of bis will be practically normal. 
ise ith be true whether the distributions of the individual axis 

various 1273 are or are not normal or are or are not alike. 
The t Distribution. А number of sample statistics from normal 
[oa ulations have sampling distributions that are nonnormal 

9cause they are not linear fu 


га inctions of the cases, although the 
; iStributions all approach normality a5 the size of the sample is 


normal sampling distributions 
eai distribution," and the 


on which these distributions 


"a 
а К : 
listribution.”! The occasions 
“Student’s distribution,” after 


=o uM 
he t distribution is also referred to 25 
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arise will be discussed in subsequent chapters. It is the purpose 
here merely to describe these three important nonnormal sam- 
pling distributions. 


osol a. 


| / 
The standard | 
030- порто curve 1 


0.25 


0.20 


0.15 

0.10 

0.05 

0 

4 А-А) 0 4 342 33 34 
х 
gort 
Fic. 31.— The Standard normal curve compared to the curve of Student's 


distribution. 


A graph of the ¢ distribution is shown in Fig. 31. It will be 
noticed that the curve is symmetrical about the mean and looks 
very much like the normal curve. It has larger tails than the 


the pseudonym of the man (W. S. Gossett) who first called attention to it. 
Sometimes the variable 2 = + log. F is used instead of F. The distribution 
of z = $ log. F is shown on page 114. 

ч It will be noted that the vertical scales of figures showing discrete prob- 
abilities are labeled “Probabilities,” or “P(@),” “Р(Н),” which mean 
“probability of an z" and “probability of an H,” etc. Vertical scales of 
figures showing probability curves cannot properly be so labeled; for in the 
is merely an ordinate. 


for that range. If the ve 


labeled f(z), then the probability, i.e., Р(х), of a case falling in the infinites- 


For a finite range, ду — 2s, 
16) y, yr, ууз, etc., are used 
as equivalents of f(t), f(F), f(x?), ete. In this book, the vertical scales of 
probability curves are, generally speaking, not labeled. 
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normal curve, however, as can be seen from a comparison of the 
two curves. 
The for [ istribution i 
ormula for the / distribution 15 


(552): 
2 di a) 


—[n-2 eN 
val (+) 2 


where n is a constant that determines the shape of the curve, 
just as X and в do in the case of the normal curve. 

In general, the formula says that the infinitesimal portion 
of the total area cut off by the infinitesimal class interval ¢ to 
t + dt is equal approximately to the area of the rectangle whose 


(254): 
height is 2 і and whose base is 4. The 
(+ 


dni has its mode at 0 and tapers off symmetrically in both 
lreetions as / goes to + ^ and — œ. Its mean is 0 and its 


Vari : n A ‚ HE 
ariance is DE Its skewness 1$ Zero, and its kurtosis 18 


miel than 3 but approaches 3asn increases. In general, the t 

gis ve approaches the normal curve as ^ increases; in fact, the 

алдата normal curve is a good approximation to the ¢ curve for 

Values of n » 30.! 

T n х? Distribution. The х? distribution is a positively 

nae. distribution, а picture of which is given in Fig. 32. The 
ematical formula for the curve is 


1 IE 3 
арке. 007 3065 (2) 
2(n—2 
(2)2 (257): 


Wher d 
ids n is a quantity that determines the shape and position 
Dor ie curve. In general, the formula says that the infinitesimal 

rtion of the total area cut off by the infinitesimal class interval 


1 
A better approximation for values between 30 and 100, say, is given by 


nn. 
instead of 1. 


taking ай = 
ao 
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x? to х? + dx? is equal approximately to the area of a rectangle 


54 n-2 


2 


4 


whose height is A WT - and whose base is d(x?). 
ae) 


i i — 2, and 
The x? curve begins at 0, rises to a peak at x? = n 2, me 
falls again to zero as x? goes to infinity. The mean of the curve 


0.20- 


0.15 + 


0.10 
Ts 
005 N (r544) 
| < 
N 
1 "SK 
M 
| Б 
0 
0 Б 10 15 20 25 X 
Mode Mean Mode Mean 
=2 =4 =/2 =4 


Fic. 32.— The X* eurves for n. — 4 and for n = 14. 
is at n, and its standard deviation is 4/25, The skewness of 


the curve, as measured by E is V/2/n. There are thus 
8 


different x? curves for different values of п. As m varies, the 
curve changes both its position and its shape. For larger values 
of n, the curve is located further along the х? axis and is more 
Spread out and more symmetrical. Figure 32 pictures two 
different x2 curves, one for n = 4, the other for n = 14. In 
general, as n gets larger, the x2 curve approaches the normal curve: 
The normal curve is, in fact, a Special case of the x? curve. 

The F Distribution.» The F distribution is positively skewed 
like the x? distribution. A picture of the F distribution is shown 


! The reader is warned that the F of the F distribution and the F curve 
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in Fig. 33. The mathematical formula for the curve is as follows: 


m—2 


ntet, (m) (ns) (Р) E 
dP = LEM mi 


бууу еее 


in which т, and n; are constants, such as n of the ¢ curve and the 
2 x = $ 
X* eurve, which determine the character of the curve. 


dF (3) 


wi зле за sian 
d б 
~ Fig. 33.—The F curve form = 4, п: = 3. 
The formula for the F curve says that the infinitesimal portion 
9f the total area cut off by the infinitesimal class interval F to 


"+ dF is equal to the area of an infinitesimal rectangle whose 


т + sy Ж Ка 
height, is ( 2 Lais ‚ and whose base 


(= == 2), (=) (nF + n) * 
is qp * " 
а пг(т — 2) sad 


The F curve starts at zero, rises to а peak at ins 3-2) 


falis case . . ig № 
alls again to zero as Р goes to infinity. Its mean is 2797 and 
ق‎ 
i n 2(пз + m —2) т З 
‘ts standard deviation is E OW A The curve 
= ni(no 


get larger, the F curve 


t ue 
hus varies with n; and n» AS ™ and n» 
каныў, Ld 


ө С шз сш LL 


formula has no relationship to the F symbol employed to represent 


Frequencies, 
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tends to become more Symmetrical; and as n, and na both 
approach infinity, the F curve approaches the normal curve. If 
one of the n’s approaches infinity, while the other remains 
small, the F curve approaches the x? curve, If nı = 1 and n» 
approaches infinity, the distribution of VF approaches the 4 
distribution. "Thus the normal curve, the é curve, and the x? 
curve are all special cases of the F curve.! 


1 As already pointed out on р. 109n., the variable z = 2 log, F is sometimes 
used instead of F. Itis this z distribution that R. A. Fisher has called atten- 
tion to as the one general sampling distribution, Cf. Paul R. Rider, An 
Introduction to Modern Statistical Methods (1939), who cites J. O. Irwin, 
“Mathematical Theorems Involved in the Analysis of Variance,” Journal of 
the Royal Statistical Society, Vol. 94 (1931), pp. 287f.; and В. A. Fisher, “On 
the Distribution Yielding the Error Functions of Several Well-known 
Statistics,” Proceedings of the International Mathematical Congress (Toronto, 
1924), pp. 805-813. The accompanying figure, numbered 34, is a picture of 


the z distribution for ж = 4, n = 3. It will be noted that the logarithmic 
transformation produces а much more symmetrical distribution. 


15 
50 


25 


0 
725 -20 -15 -10 -05 0 05 10 15 20 25 30 35 
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Fic. 34.—The z curve forni = 4, m = 3, 


CHAPTER VII 


NUMERICAL CALCULATIONS 
FOR FREQUENCY CURVES 


um this point the discussion has been primarily concerned 

he how and the why of frequency curves, 1.6, with theory 

ш a This chapter will show how frequency curves may be 

m EA how frequencies and probabilities may be computed 

ai аа ed curves, how eurves may be fitted to sample data, 

rime the goodness of fit may be tested. Attention will thus 
in numerical rather than in abstract calculations. 


N GRAPHING FREQUENCY CURVES 
T на sometimes des red for instruetion purposes or for illustra- 
Boss 9 make graphs of some of the better known frequency 
ы ч, This section will indicate how such graphs may readily 
aia em The purpose is to show how à curve may be plotted 
Probl the constants of the curve have been determined. The 
Selve! em of determining numerical values for the constants them- 
T will be discussed in the third section on curve fitting. 
stands Normal Curve. Probably the easiest curve to graph is the 
Mon ne. normal curve. Tables of the ordinates of this curve 
is Table v computed for various abscissa values. Such a table 
e VI in the Appendix. This gives values of 


48 
l qu 


t y= в V2r 
Saf various values of 2/8. To graph а normal curve it is neces- 
нивы to plot these ordinates at selected values of x/é and 
one curve through the tops of the ordinates. This has been 
note ер Fig. 31. In making a quick freehand graph it may be 
8 d that the curve is symmetrical, its mean and mode come at 
ph its points of inflection are at 1/8 = 1, and its tails 
Th to plus and minus infinity. 
€ standard normal curve is а norm 


15 Zer ra d 
ero and whose standard deviation 18 
115 


al eurve whose mean 
1 (since the abscissa, 
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scale is in terms of 2/6 units). То graph a normal curve with 
а given mean it is necessary merely to place the zero point of the 
standard curve at the mean and to adjust the abscissa scale for 
the difference in standard deviations. Further details and special 
adjustments are described in the section on testing goodness of 
fit of a frequency eurve.! 

The t, x°, and F Curves. For most other curves no tables of 
ordinates have been computed. Graphs of these curves must 
therefore be made directly from the equations for the curves. 
Most of the equations are such that it is easiest to find the 
logarithms of the ordinates first and then convert these to anti- 
logarithms. This method will now be illustrated for three impor- 


tant nonnormal sampling distributions, viz., the t curve, the x? 
curve, and the F curve. 


The equations for these curves are 


Ги a) 
nT A ( + JEN 


(2)2 A ? 


ти À m m m-2 
(btas J t^s 
TAR s 4 


ni — 2 2 Kz @) 


If logarithms are taken to the b 


ase 10, these equations become 


n—1 E 
log y, — log ot ) — Flog n — i log т — log (% 2), 


log уз = — = log e + “> 


1 Бее рр. 137-152. 
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2 


ni — 2 n — 2 n» — 2 
tx ор — log ("5 ) tog ( E ) 


= mpm log (mF + ne) (3). 


log ур = log (5 qui 2), + n log nı + > log n» 


lo In using these equations it is to be noted that logis 2 = .301030, 
Б € = .434295, and logi r = .19715—; logarithms of fac- 
torials can be obtained from tables of the gamma function! given 
in Tracts for Computers (Cambridge University Press, London, 
1921) No. IV. For small values of n and nı and s, the factorials 
May easily be computed directly. For example, if n = 8, 
then 272,282 gio 3x 2X 17 Û. Ил = 7, then 
EL -5 x 2 X 1 ут 255). For selected values 

3 X3X3 VT (see page ). т s 
of ор nı and n», Eqs. (1^) (2), and (3^) will give values of log y 
Or various values of 1, x°, and F, and values of у can be com- 
Puted by taking antilogarithms. Tables 14 to 16 illustrate this 
Process for each of these three curves and Figs. 31 to 33 of Chap. 


I show the final graphs.? 


* By definition (n — 1)! = T(n) for bothintegral and fractional values of n. 

¢ PP. 79n., 133-134. | m 

4 Ordinates for the z distribution may be obtained from the F distribution 
8 follows, For the values of F for which ordinates have been computed, 

€ corresponding values of z may be obtained from the relationship * 


= 1.1513 logio F 


Se 


z = Vlog, F 

us 2 ordinates for these values may be eomputed by multiplying the 

; m responding F ordinates by 2e%, or the logarithms of the 2 ordinates 
ed be obtained by adding logio 2 + 10610 F to the logarithms of the F 
Inates, 


Нав to be computed directly, the equation to be used is 


log г = log 2 + log (* +m = 2) T Zi log mab 3 log nz + mz 10610 € 
= ni + Ne vi 
— log (= E 2): — log (ez = асс log (mie? + тз) 


Vi 

iei of e** can be found from tables of e 
е tical tables, This was how the figure 
rived. 


given in most books of mathe- 
in the footnote to p. 114 was 
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therefore need only know how to use these tables in order to 
compute the desired probabilities; it is not necessary for him to 
be an accomplished mathematician. 


TABLE 16.—CaLCULATION OF THE ORDINATES OF THE F DISTRIBUTION 
(nı = 4, na = 3) 


(1) (2) (3) (4) (5) (6) (7) (8) 
, mi F ns ~ iloi 
F log F JlogF mP + n ба ыы а [Ж Т cd 
х (5) 
-01 |—2.000000 | —2.000000 3.04 | 0.482874| 1.690059,9.142335 — 10] .139 
.02 |—1.698970 | — 1.698970 3.08 -488551) 1.709928/9.323596 — 10| .211 
-05 |—1.301030 |—1.301030 | 3.20 -505150| 1.7680259.663239 — 10| .461 
-10 |—1.000000 |—1.000000 | 3.40 -531479] 1.860176,9.872218 — 10| .745 
+20 |— .698970 | — .698970 3.80 -579784 2.029244| .004180 1.010 
-30 |— .522879 | — .522879 4.20 -623249| 2.176372| .033143 1.079 
-40 |— .397940 |— .397940 4.60 -662758] 2.319153] .015301 1.035 
.50 |— .301030 |— .301030 5.00 .698970| 2.446395 9.984969 — 10| .906 
-60 |— .221849 |— .221849 5.40 -732394| 2.566379/9.944166 — 10] .879 
+70 |— .154902 | — .154902 5.80 -763428| 2.071998/9.905496 — 10| .804 
-80 |— .096910 |— .096910 | 6.20 -792392) 2.773372)0.802112 — 10| .728 
-90 |— .045757 |— .045757 | 660 :819544| 2.807854/9.818783 — 10] .059 
1.00 0 0 7.00 .845098| 2.957843/9.774551 — 10| 505 
1.10 -041393 -041393 | 7.40 .869232 3.0423120.731475 — 10| 539 
1,20 -079181 -079181 7.80 -892094) 3. 12232919. 689246 — 10] (489 
1.30 .113943 .113943 | 8.20 :913814| 3.198349]9.047988 — 10| .445 
1.50 . 176091 .176091 9.00 .954243 3.339851|9.568634 — 10] .370 
1.70 -230449 -230449 | 9.80 -991226) 3.4692919. 493552 — 10] .312 
2.00 -301030 -301030 | 11.00 | 1.041393 -644876/9.389647 — 10] .245 
2.50 -397940 .397940 13.00 1.113943 .898801 9.231533 — 10] .170 
3.00 -477121 -477121 | 15.00 | 1.176091 :1168139.093202 — 10| .124 
3.50 - 540680 -540680 | 17.00 | 1.230449] 4 306572 8.966502 — 10| .093 
4.00 - 602060 -602060 | 19.00 | 1.278754 -475639/8.858815 — 10| .072 
5.00 . 698970 -698970 | 23.00 | 1.361728 766048 8.665415 — 10| .046 
10.00 | 1.000000 | 1.000000 33.00 | 1.633408 5,7171387 015250 10 


* Kr = log ( 


2.732394, 


The Normal Curve. Table VI in the Appendix gives the area 
under the normal curve between the mean point 2/6 = 0 and 
selected values of 2/6. Since the curve is symmetrical, the areas 
are the same for plus deviations as for minus deviations. 

To illustrate the use of the table consider a normal curve whose 
mean is 100 and whose standard deviation is 10. Since the mean 
‘of the standard normal curve is taken as its origin, 100 will in 
this case be the origin from which deviations will be measured, 


з 
а 
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Since 10 is the standard deviation, the deviations from the mean 
Wil be measured in multiples of 10. To find the probability 
of a case falling in the interval 100 to 120, for example, set 
$ _ 120 — 100 : 
$ = 2 and note from Table VI of the Appendix 
that the probability of a normally distributed variate falling 
between 0 and 26 from the mean is .477. Hence the prob- 
ability ofsa case falling between 100 and 120 is .A77. То find 
the probability of a case falling between 90 and 100 set 


and note that the probability of à normally distributed variate 
falling between 0 and — 16 (same as between 0 and +1) is .341. 
Hence the probability of a case falling between 90 and 100 is .341. 
To find the probability of а case falling between 90 and 120, it is 
necessary merely to add the probability of a case falling between 
90 and 100 to the probability of a case falling between 100 and 
i 20. "Thus the probability of a case falling between 90 and 120 
15.477 + 341 = .818. To find the probability of a case lying 

etween 110 and 120 it is necessary merely to subtract the 
Probability of a case falling between 100 and 110 from the prob- 
ability of a case lying between 100 and 120. Since the prob- 
ability of a case falling between 100 and 110 is the same as the 
Probability of a case falling between 90 and 100, it follows that 
he probability of a case falling between 110 and 120 is 


ATT — .341 = .186. 


These computations are pictured in Figs. 35a to 35d. 


The ¢ Curve. Normal curves may be drawn with different 
means and different standard deviations, but when the variable 
sd been measured in standard deviation units and is measured 
ош the mean as an origin all normal curves become one and the 
"апе curve, i.e., the standard normal curve. It was therefore 
Possible to construct a simple table which gave areas under the 
p (Le, probabilities) for various 2/6 deviations from the 


The ¢ curve is inevitably used in the standard form. Even 
1з on the parameter т. 


in Е 
that form, however, its shape depenc par e 
ence areas under the curve, i.e., the curve probabilities, will be 
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different for different values of n. A ( table is thus а threefold 
table, listing the areas or probabilities that correspond to different 
deviations from the mean value for various values of n. 

A typical ¢ table will be found in the Appendix, Table VII. In 
contrast to the normal table, the deviations from the mean are 
given for selected probabilities, rather than the reverse. Тен 
selected probabilities are probabilities of an equal or greater 
absolute deviation, not probabilities of an equal or smaller devia- 
tion, as is true of the normal table. That is, the ¢ table selects 


-t -2228 0 


2.228 +£ 
Fie. 36.—Equal probability areas of .025 in the tails o 


Е the ¢ curve for n = 10. 


areas. Since the curve is symmetrical, 
divided between the two tails. 


An example will illustrate the use of the table. Suppose the 


find this deviation, locate the TOW n = 10, and proceed to the 
right until the column marked .05 is reached. The figure 80 
located is the deviation desired. It will be seen to have the 
value 2.228. This means that a deviation of +2.228 will mark 
off an area of .025 on the upper tail and a deviation of —2.228 
will mark off an area of -025 on the lower tail and the two devia- 


a of .025 on each tail, or a total 
area of .05. This is illustrated in Fig. 36. 
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um x Curve. The x? table, Table VIII in the Appendix,.is 
ed ns like the t table. dt is again a threefold table, giving 
like th probabilities) and deviations for various values of n. Also, 
es 2 table, it lists deviations for selected areas or probabili- 
us ам these areas refer to the portion of the curve beyond the 
ж. ion. That is, the areas represent probabilities of an equal 

greater deviation. Unlike the ¢ table the deviations are 


0 4 11.668 x? 
he upper tail of a x? curve, 
22 11.068)]. 


Fio, : : 
+ 87a.—The 2 per cent area in t| n = 4 [02 is the 
Р(х 


таеазите from the absolute origin of 0 and not from the mean 


ol the curve. 
М To illustrate the use of the x? table, let n = 4, and let the prob- 
А E be to find the deviation from 0 for which the probability of 
equal or greater deviation is 102. To find this deviation, 

he Cate the row n = 4, and proceed to the right until the column 
айе .02 is reached. The figure $0 located is the deviation 


esired. Tt will be seen to have the va 


eviati 
Viation from 0 for which the area 
deviation for which the prob- 


ability of an equal or smaller value is just .02. Let n be 4 as 
ethe row n = 4, and proceed 
ure 80 located will be the 
have the value .711. That 


esi EA 
desired deviation. It will be seen to 
t of the curve lies to the 


1i 
S, the deviation for which 98 per cen 
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right is also the deviation for which 2 per cent of the curve lies 
to the left (see Fig. 37b). It will be noted that the medians of 


[orit] 4 x? 


Ft. 37b.—The 2 per cent area in the lower tail of a x? curve, n = 4 [.98 is the 
P(x? = 0.711)]. 


_ 


3351 
Fic. 37c.—' The median point of a x? curve, = 4 [50i F 57) 
EREE AT 1 2 5 3.35 
"ABS dui [ is the P(x 
various x? curves are the deviations in the column headed .50* 
(see Fig. 37c). 
* The mean, it will be recalled, is п and th VI, 
I vos ; e mode тж — 2. See Chap. 
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The Г Curve. The F table (see Appendix, Table IX) is like 
the x? table except that it takes account of two parameters, 
^ and ne, instead of one. Tt is thus a fourfold instead of a three- 
fold table. Because of this greater complexity, it gives values for 
only two tail areas or probabilities, viz., the upper .05 tail and 
the upper .01 tail. 

The use of the F table may be ilustrated by the following 
problem: Let n; = 4 and л» = 3, and let it be desired to find 
the deviation for which the tail area (or probability of an equal 
Or greater value) is .05. То find this deviation, first select the 


F'Curve 


912 


0 


Fi. 38. — Probability area of .05 in the upper tail of an F curve, nı = 4, n: = 3 
1.05 is the Р(Р 2 9.121 


row n; = 3, and proceed to the right until the column headed 
nı = 4 is reached. The lightface figure (the adjacent boldface 
figure is the “.01 point") so located is the deviation desired. 
he “05 point” is seen to be 9.12. The “.05 point" is shown in 
Fig. 38. If the problem were to find the deviation for which the 
tail area or probability of an equal or greater area were just .01, 
then the '*.01 point” would have to be used; the rest of the pro- 
cedure would be the same. | 
Other Curves. Special tables have been derived Tor the 
Computation of areas and probabilities for curves belonging to 
the Gram-Charlier system. These may best be explained, how- 
ever, in later sections! dealing with testing the goodness of fit 
of a Gram-Charlier curve. It can be remarked here that the 
tables are generally similar to the table of the normal curve. 
The tables of the normal, t, x^, and F curves, together with the 
tables used in computing probabilities for a Gram-Charlier 


1 See pp, 139-150. 
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curve, comprise the most commonly used tables of арын 
When tables are lacking, the areas or probabilities must Se 
computed directly or indirectly from the mathematical un 
Гог the curve. If the equation is a simple one, the area may S 
found by the application of the integral caleulus. Often, i 
ever, the equation for frequency curves make it difficult to € 
the integral calculus, and some approximate method must = 
employed. If an integraph is available, the curve need Pe, 
be plotted carefully and the integraph run around the бена 
area. If an integraph is not at hand, the area may be approxi- 
mated from some “quadrature” equation that expresses ш 
area of a given interval in terms of the ordinates of the curve for 
that interval and for the neighboring intervals. 

Some of the more important quadrature equations given by 
W. P. Elderton are as follows:! 


Area under the curve iome = — $46 = +4 


approximates yo — зи (Ау, — Ay) (4) 
or, if greater ассигас 


б, x 3 Я is 
Y is desired, а nearer approximation 1 
obtained by using 


291 17 
yo — 5760 (Ayı E Ayo) + 5,760 (Ду — Ayı (5) 


In these expressions, 2 = 
interval for which the are. 
x = + 3 are the values of 
val. The units are 


a is to be computed and z = — 4 and 


interval; Ayo means the differ 
t = 1 (Le, the ordinat 
interval) and the ordina 
between the ordinate y. 


ence between the ordinate yı at 
e at the mid-point of the next highe! 
te yo at £ = 0; Ay, means the difference 
2 at the mid-point of the second ie: 
higher interval and the ordinate y; at the mid-point of the ne* 

higher interval ; Ay_1 means the difference between the ordinate 
at x = 0 and the ordinate at z = —1 (ie, the ordinate at x. 

id-poi next lower interval); and Ay-2 means Ul 

the ordinate at the mid-point of the ne* 
the ordinate at the mid-point of the seco? 


! Frequency Curves and Correlation, Рр. 25-26, 48, 
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e interval. In short, if ys Yu V» V-5 and уг represent 
= dédinates at m2, m= h 0, a= —1, and t= 
A ‚ then Ayı = ys — У, Ayo = 917 yo Ау = Yo — 7-0 and 
Yue = yai — 0-2. 
ү Те illustrate the use of these equations suppose that ordinates 
а frequency curve for five values of the variable are as follows! 
(see Fig. 39): 


x y 


105 1.21 
115 3.40 
125 16.10 
135 47.57 
145 78.87 


podus proceeding further it is to be noted that the frequency 
Ee i from which these ordinates have been taken was drawn 
idt at the curve would fit а histogram in which the area of an 
үш was represented by the heights of the rectangles and not 
(a area. That is, the ordinates are actually one class interval 
if СА the class interval is five) times greater than they should be 
е area under the curve is to be correct. It may also be noted 
that in this case the curve gives the distribution of 300 cases and 
is drawn so that the total area is, not 1, but 300. 
То find the area under the given curve for the interval whose 


mi e Е 
nid-point is 125, set up the following table: 
y Av 
1.21 |ya — 9-2 = 3.40 1.21 = 2.19 
34 |J 60 3.40 = 12.70 
= 47.57 — 16.10 = 31.47 


16.10 |y: 7° 
47.57 PET = 78.90 — 47.57 = 31.33 


78.90 


е is shown graphically in Fig. 39. _ 
.By using Eq. (4) a first approximatio 
&lven by 


n to the area will be 


16.10 — 44 (12.70 — 3147) = 16.10 + .78 = 16.88 


sofa Gram-Charlier curve fitted to the 
43-145). It will be interesting 
ined from the tables of 


А, 
dé pr these are the ordinate 
es s of 300 Princeton freshmen (see РР. 1 
Bio mpare the area calculated here with that obta 
abilities for а Gram-Charlier curve (see p. 147). 
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Equation (5) gives as a second approximation! 


291 17 
— - PL - me t 6 
16.10 — go (12.70 — 3147) + 5760 (2:19 — 31.30) = 16.9 


Since the original ordinates were one class interval times as large 
as they should be (if the area 
y under the curve was to equal 
the total frequency), there is 
no need in this case to multi- 
ply the 16.88 or 16.96 by the 
class interval. Hence the final 
answers are 16.88 and 16.96 
cases. If these are divided by 
300 (the total number of cases), 
105 5 125 H5 № x the results are .0563 and .0565, 
Tic. 39.—Five ordinates of the Which are the first and second 
weighs of 800 Princeton items." SPPrOsimations to the probar 
bility (relative frequency) that 

a case falls in the interval 120-130, 
_ It will be noted that the initial term in each of Eqs. (4) and (5) 
is the ordinate of the curve at the mid-point of the interval for 
which the area is to be calculated. This means that the area 
of the rectangle whose height is the ordinate of the curve at the 
mid-point of the interval and whose base is the class interval 
? may be taken as a first approximation to the area under the 
curve over that interval. That is, it is assumed that the curve 
can be roughly approximated for the given interval by the hori- 
zontal straight line y = yo. The extra terms added in Eq. (4) 
imply that the curve can be better approximated by a second- 
degree parabola drawn through the ordinates of the curve at 
x= 1,2 = 0, and z = —1, that is, through the ordinates, Yi 
Yo, and y-, Equation (5) assumes that a still better approxima- 
tion to the curve can be obtained by a fourth-degree parabola 
drawn through the ordinates of the eurveatz = 2,2 = 1,2 = 0, 
x= —l, and т = —2, that is, through ys, yi, уо, Y- and Y- 
Elderton also gives equations based upon these same degrees 
of approximation, that express the area over an interval in terms 


1 The area computed 1 


TE { ater from a table of probabilities for а Gram- 
Charlier сигуе is 16.98, w 


hich shows how good this approximation is. 
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of the ordinates of the curve at the ends instead of the mid- 
points of that interval and of neighboring class intervals. 


FITTING FREQUENCY CURVES 


То fit a frequency curve to a given set of sample data means 
to determine numerical values for the constants of the curve 
equation. This is usually done by relating the curve constants 
to the various statistics of the data. The problem will now be 
considered with reference to the more important types of curves 
that are fitted to sample data. 

The Normal Curve. When the variable is expressed as an 
absolute deviation from the zero origin, the equation for the 


normal curve is 
NE 77 Я 6 
w= - exp [=65 (6) 


Where X and в are the mean and standard deviation of the curve. 
It would appear, therefore, that the normal curve could be fitted 
to a set of sample data by merely putting the mean of the data 
and the standard deviation of the data in the curve equation. 
This is essentially the method that is used. A particular adjust- 
Ment must be made, however, Whenever the sample standard 
deviation is computed from grouped data. This adjustment will 
now be discussed. 

When data are grouped for the purpose of calculating а mean, 
Standard deviation, or other statistic, a certain arbitrary distor- 
tion may result from considering all the cases of a given class 
interval as being all concentrated at the center of the interval 
or, what amounts to the same thing, as being uniformly dis- 
tributed throughout the interval. If there were enough cases 
in an interval, it would probably be found that the frequencies 
of the cases i the interval would actually form a smooth curve 
that tended to rise toward the center of the distribution (see 
Fig. 40a) instead of forming а horizontal line as assumed by the 


Brouping of the data (see Fig. 400). . 


The error that results from grouping is generally negligible 
h as the mean, since 


i = 
1 the case of odd-powered moments SUC 

errors in measuring positive deviations tend to be offset by 
errors in measuring negative deviations. For even powered 


See Етрквтом, W. P., op. cit. PP- 25-26, 48. 


132 GENERAL THEORY OF FREQUENCY CURVES 


moments, however, such as the variance, the errors of measure- 
ment are cumulative and result in a net error due to grouping. 
W. F. Sheppard! has put the error in the variance at yet? where û 
is the size of the class interval. 

Before the normal curve is fitted to a sample histogram, there- 
fore, the variance of the sample, when calculated from grouped 
data, must be corrected for grouping by subtracting 1572. The 
square root of the. corrected variance will give the corrected 


Fic. 40a.— Actual distribution of Fic. 405.— Assumed distribution of 
frequencies in a class interval. frequencies in a class interval. 


Standard deviation. When the mean of the data and this cor- 
rected value for the standard deviation are put in the general 
formula for the normal curve, the result will be a normal curve 
that has the same mean and same standard deviation as the 
given data.? 

Whether the сигуе actually fits the data depends on how truly 
normal the data are. The goodness of fit can usually be made 
evident by plotting the normal curve on the same chart as the 
histogram of the data. If the fit or lack of fit is somewhat 
doubtful, various methods may be employed to test the goodness 
of fit. The x? test of goodness of fit is discussed below.* 

Nonnormal Curves. The reason for fitting а normal curve $0 
а given set of data is to determine whether the data are ог are not 
normally distributed. If the distribution is normal, then the 


1 See Proceedings of the London Mathematical Society, Vol. 29 (1898) p- 353. 
? In fitting a frequency curve the data are generally very numerous, 50 


the multiplication of the sample variance by 


N Е í 
N — t° improve the estimate 


of the population variance affects the result but little and can therefore Þe 
neglected. 

3 See pp. 137-139. 

* See pp. 139-150. 
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normal table can be used to calculate probabilities and the results 
of sampling can be predicted by the carefully worked out sampling 
theory applicable to normal populations. 

When data are distinguished as nonnormal, there may be 
further advantages in fitting a nonnormal curve to the data. 
Such a curve, for example, may serve to “smooth” the histogram 
and may thus permit a more accurate determination of the rela- 
tive frequencies of the population from which the sample was 
taken. The identification of the distribution of given data with 
a particular frequency curve may also serve to distinguish them 
from other data the distribution of which is identified with a 
different frequency curve. The “fitting” of frequency curves 
may thus help to classify data as to type. These are the two 
principal reasons for fitting nonnormal frequency curves. 

A Gram-Charlier Curve. The usual type of nonnormal curve 
fitted to data of everyday life is either a Pearsonian curve or à 
Gram-Charlier curve. Since the latter is the easier to fit, it will 
be discussed first. 

In the case qf the normal eurve, the curve constants were 
themselves commonly computed statistics, viz., the mean and 
Standard deviation of the data (the latter corrected for errors 
of grouping). In the case of à Gram-Charlier curve, the general 


requency equation is" 
| х) (x-XY 
A —(1 + A 3 -X r S 
2r б" 


АТ. : viis 

ja 3 E Ж = a») bis [- 95s 24 (7) 
Where 9 
А=- 5" ad B= e TUE 


Hence the constants of the eurve are again functions of commonly 
Computed statistics, viz., the mean X, the standard deviation 6, 
the third moment Hes and the fourth moment из. Therefore, 
a Gram-Charlier curve can be fitted to a set of sample data by 
Substituting the sample values for the population values X, 
» V» and uy of the curve equation. 
If the iic values of x, с, etc., have been computed from 
Srouped data, аз is usually the case, then a correction for group- 


* See pp. 92-99. 
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ing must be made in the case of the even-power moments c? = s, 
and ра. Sheppard's corrections in these instances are 


из = p (uncorrected) — qs (i)? 


- из (uncorrected) 4- т10(0)* 


(8) 


#4 = ра (uncorrected) — 


where the ws stand for the corrected moments and for the 
uncorrected moments. 

The goodness of fit of the curve to the data may be examined 
by simply making a graphic comparison of the curve and sample 
histogram to which it is fitted, or a x? test may be undertaken. 
These tests of goodness of fit are described and illustrated below. 

А Pearsonian Curve. Karl Pearson, it will be recalled,! found 
that frequency curves in general might be represented by an 
equation of the type 


А cata 

Relative sfope = БЕ 653 (9) 
The logical basis upon which this system of curves rests was dis- 
cussed in Chap. IV. It is the purpose here to describe the fitting 
of such a curve. 

The first step is to relate the constants а, bo, bi, and by of Eq. 
(9) to various statistics computed from the sample data. The 
method by which this is accomplished is explained by W. P. 
Elderton in his Frequeney Curves and Correlation (pages 39 to 40). 
The essence of the procedure is to determine а, bo, bs, and bz so 
that the first four moments of the fitted curve will be the first 
four moments of the equation. The algebra is elaborate and 
will not be repeated here. When the results obtained by Elder- 
ton's analysis are substituted in Eq. (9), it becomes? 

Relative slope — ` 
+ У VB (Bo + 3) 
- 2)58: — 68, — 9) (10) 
u2(482 — 361) + Vus VB: (Ga + Зул + (262 — 36, — 6)a? 
2(582 — 66: — 9) 
1 See Chap. ТУ (p. 57). 
* Elderton's method involves the assumption that z^(b, + bor + b2*)U 


vanishes at the ends of the range of the distribution, ż.e., that there is close 
contact with the z-axis at both ends. 


It will be noted that the mode of the curve is that point at which the 


relative slope is zero. Equation (10) thus shows that the mode of а Pear- 
sonian curve comes at 


N р " 
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where x = X – Х E = 2 А 
have the same sign s из. po sisi ad кам 
Bus equation is now expressed in terms of the 
ba B je: the 8 coefficients of the curve and the equation can 
ees y substituting the moments of the sample data for the 
ae posi The values of the second and fourth moments 
башат h e adjusted for Sheppard's corrections if they have 
The = шеп computed from grouped data [see Eqs. (8)]. 
ane 5 earsonian equation (9) , it will be recalled, yields differ- 
p Бу of curves, depending on the value of the criterion 
this c UN bobs. When the b’s are given their values from Eq. (10), 
8 criterion becomes? 
Em 8.08 + 3)? 
Д * = TU, — 38) Ob — 3% — 9) nd 
хает of curves distinguished on ће basis of this equation 
ollows:? 


ПЕ 


Value of criterion Curve type 
« = 0,6: = 0, 8: >3 VII 
x = 0, 6: = 0, 6: = Normal curve 
к = 0, 6: = 0, 0: <3 Ha 
x = 0, в: = 0, 6: < 18 IIb 
0<x<1 IV 
= 1 у 
1<к< o VI 
к = c, that is, 262 — 31-6 = 0 III 
x <0 П 


_ -vm У +8) 
А 2 = —5(5, — 6-9) 
nce z = X — Xand Ми: = 6, 
Ме =® = эр Voi +3 
Where à 269. — бй 7 3) 
е В is to be given the same sign as ws 


Measured by x — Mo 
га 


Since skewness сап be 


6 
E 
А = 558. — бй — 9) 
А See Chap. IV (p. 57). А 


ее BLDERTON, ор. cit., p. 42. Part I, р. 1хій 
art I, p. 1хій. 


ms 
Taken from Tables for Statisticians and Biometricians, 
f 
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- -It'is clear at once from.Fig. 41 what type of curve is yielded 
by usual values of 8; and 82; it is unnecessary in most wc 
to go to the trouble of caleulating the criterion value к. The 
reader should be warned, however, that, if sample values of 


|= 
n) = ا‎ 4 3 
2 | 
= 1 ; =| ВЕ 
T 
Jr 
3€ IS at 
T LJ, 
1 2 
Be IV 
5 | | 
TET- 
6 ii | 77 
Г Vrerterotypic™ V 
1 AES 
8 


0 02 04 06 05 ШЕ йй б 18 
8ı 


" " og A a 
Fic. 41.—Diagram to determine the type of a frequency distribution from 
knowledge of the constants 8, and Во. * 


Р T bs 1, 
‘gp Produced with permission from Tables for Slatisticians and Biometricians, Pu 
66. In this diagram Urr refers to type IIs and 11, Jr, and Ur refer to limited ra 


Kw оа И QUE T iiir ihe qim by Дана 
Statisticians and Biometricians, Part Г, p. lxiii. 
B; and @» are near a border line, there may be a good chance, owing 
to sampling errors, that the population data are of the type 
lying on or on the other Side of the border rather than of the 
type indicated by the sample 8's themselves. 

Thus the type of Pearsonian curve that fits a given set of data 
may be found immediately by substitution of the corrected В 


of curves. As developed in Chap. IV, the logical classification is аар 
the normal curve, the type ПІ curve, and the others. Cf. Tables for Statist 
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=; > the criterion formula or by the location of the 8 
form) x ia 41. 'The equation for the curve (in differential 
Nos b obtained by substituting the corrected moments 
this is omes in Eq. (10), as was indicated above. Since 
DM 5 ап ferential equation that gives the relative slope of a 
TN dee point and not the actual ordinate, it is necessary to 
in. Ine s i ME for the ordinates of а curve before the curve 
the Ns hed. This problem wi 
Tien section on testing goodness of fit.! 
bu ени Sampling Curves. Sometimes laboratory experiments 
р А taken to test certain sampling theories. These yield 
get an sampling distributions. Sampling theory may sug- 
Ава | these empirical distributions should conform to certain 
bly eee sampling distributions. To test this theory or possi- 
ES soma o Bee how well the empirical distribution is approximated 
fine о ВОЕН theoretical distribution, a particular sam- 
S lidem is fitted to the empirical distribution. In such 
Siva es the process of fitting is relatively simple. To fit the t 
the x X* curve, or Ё curve it is necessary merely to determine 
iD values for n or nı and ne and then plot the curve as 
Tm S in the first section. The goodness of fit can be deter- 
y graphie comparison or by a x? test such as is described 


in 
the next section. 


ll be discussed more fully in 


TESTING GOODNESS OF FIT 


Two methods of testing whether a 
ample data will be described 
comparison and the x? test. 
are described in Chap. 


Tuna a Normal Curve. 

гаи curve fits а given set of s 
De These will be simple graphic c 
хүү methods of testing for normality 
M таре Comparison. 
раша of a standard normal curv 
Bein merely in plotting the ordina: 
abs cted values of 2/8. The pro 
. 201558, scale and the curve ordinates 50 


a given 5 


ah 
istogr 5 Я 
.IStogram with a given mean and 


© process is as follows 
are Irst find what the mid-points of t 
in terms of standard deviation uni 


In the first section of this chapter the 
ге was explained. This con- 
tes of the standard curve at 
will be to adjust the 
that the curve will fit 


he histogram class intervals 
ts measured from the mean 


1 
See pp. 137-152. 
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as an origin. This can be done by subtracting the mean from 
each of the mid-values and dividing these deviations by the 
(corrected) standard deviation. For the values of x/é so 
obtained, the ordinates of the standard normal curve may be 
found from Table VI in the Appendix. If the histogram with 
which the normal curve is to be compared is such that relative 
frequencies are measured by the areas of the rectangles erected 
on each interval, then the ordinates of the standard curve should 
be divided by c to allow for the fact that the abscissa scale for 
the histogram is in absolute units while the abscissa scale for the 
standard eurve is in standard deviation units." If the histogram 
is such that absolute frequencies are measured by the heights of 
the rectangles erected on each interval, then the ordinates of the 
Standard curve must be multiplied by Ni/c, where 4 is the size 
of the class interval and N the number of сазез.? When the 


TABLE l7a.—CALCULATION OF THE ORDINATES OF THE NORMAL CURVE 
THAT Frrs THE DISTRIBUTION oF Heicurs or 300 PRINCETON FRESHMEN 


а) (2) | (3) | (4) (5) 
| : 
SL E rel edges, | тылак 
625 | ~7.97 —3.22 0.00224 0.27 
63.5 —6.97 —2.82 0.00748 0.91 
64.5 |  —5.97 —2.42 0.02134 2.59 
65.5 —4.97 —2.01 0.05292 6.43 
66.5 | 397 —1.59 0.11270 13.69 
67.5 -2.97 —1.19 0.19652 23.87 
68.5 —1.97 —0.80 0.28969 35.19 
69.5 —0.97 —0.39 0.36973 44.91 
70.5 0.03 —0.01 0.39892 48.45 
71.5 1.03 0.42 0.36526 44.36 
72.5 2.03 0.82 0.28504 34.62 
73.5 3.03 1.22 0.18954 23.02 
74.5 4.03 1.63 0.10567 12.83 
75.5 5.03 2.04 0.04980 6.05 
76.5 6.03 2.44 0.02033 2.47 
77.5 7.03 2.84 0.00707 0.86 


X = 70.47 7 (corrected) — 2.47 
1 The area of the histogram in this case is one regular unit while the idet 
of the Standard Curve is one standard deviation unit. То make the t 
equal, the ordinates of the Standard normal curve must all be divided by ^ 
* The area of this histogram is УР = Ni. Hence, after the ordinates 
ence, a 
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standard ordinates have been properly adjusted, they can be 
plotted on the same graph as the given histogram and the good- 
ness of fit may be examined visually. Fits that are obviously 
good and bad may be determined in this way. 


И № 
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Fic. 42a,—Normal curve fitted to heights of 300 Princeton freshmen. 
The whole procedure of graphically comparing a histogram 
With a fitted normal curve is illustrated by Table 17a, and the 
result obtained is shown in Fig. 42a. For this example, the fit is 
Clearly а good one. It must be concluded, therefore, that the 
heights of young men of approximately the same age are nor- 
mally distributed. 

The x? Test. When it is difficul 
Whether a normal curve is a good fit to & given set of data, some 


More refined method must be undertaken. One such method 
15 the x? test, so called because it makes use of the x? distribution 
ìn determining the goodness of fit. The essence of the test is a 
numerical comparison of the frequencies of the curve and histo- 
Sram, interval by interval. The procedure 1s as follows: | 

To calculate the curve probabilities for each interval it is first 
Necessary to express the limits of these intervals in terms of 
Standard deviation units measured from the mean as an origin. 
Thus 2/8 for each class limit can be found by subtracting the 
have been adjusted for the difference in abscissa scales by division by о, as 


*Xplained above, they must be multiplied by Ni to take account of the x: 
At the area of this second form of the histogram 18 Ni absolute units ani 


Not one unit. 


t to tell by graphic comparison, 
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mean from the class limit and dividing by the corrected standard 
deviation. The process is illustrated in columns (1), (2), and (3) 
of Table 17b. 

The next step is to find from the area table for the normal 
curve the probabilities, or relative frequencies, of cases lying 
between the mean and the various class limits. Then the rela- 
tive frequencies, or probabilities, for each interval may be found 
by a process of Subtraction, and these may be converted to 
absolute frequencies by multiplying by the number of cases N. 
This step is illustrated in columns (4), (5), and (6) of Table 17b. 

If it should turn out that the curve frequency for any interval 
is less than 5, it should be combined with a neighboring interval 
or intervals so that the frequency of every interval is at least 5. 
Such adjustments are almost always necessary at the ends of the 
curve. In this connection it should be pointed out that the end 
intervals should always be taken as running to + co , so that the 


total frequency for the curve will be the same as that for the 
histogram. 


The final step is to compute the quantity шур for each 


interval and then sum for all intervals. Here f stands for the 
curve frequency and F for the histogram frequency. The value 


Ac 2 
of № а y Л is the final criterion of goodness of fit. AS 


explained more fully below,’ if normal curves.are fitted to many 
sample histograms from a truly normal population, the various 


sample values of M eor will tend to form a sampling dis- 
tribution that is of the f. 
the number of class inter 
gle interval), minus 3- 
* table for the proper value of 15 
E -p 
Y 


Hence, a selected point of the x 


say the .05 point, will serve аз a critical value for 


For if the population from which the sample is taken is ишү 
normal, there are only 5 chances out of a 100 that the value 9 


" — 2 
2; EE will equal or exceed the .05 х? value. Hence, values 


1 See pp. 331-333. 
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ot X, 07 


greater than this .05 value suggest that the popu- 
lation is not truly normal; they are an index of bad fit. 


An illustration of the procedure for calculating = (F n is 


given in Table 176. The value of ` eV there obtained 


supports the graphic analysis in showing that the normal curve 
is a good fit to the heights of young college men. The x? table 
shows that forn = 11 — 3 (i.e., the number of class intervals! less 
three), the probability of as great a value as 3.867 is between 
.80 and .90, which indicates a good fit. 

A Gram-Charlier Curve. The goodness of fit of а Gram- 
Charlier curve may also be tested by graphic comparison and 
by a x? test. "These will now be described. 

Graphic Comparison. The graphing of a Gram-Charlier curve 
is almost as easy as the graphing of a normal curve. Equation 
(7) shows that the ordinates of a Gram-Charlier curve are equal 
to the normal ordinates plus two different multiples of these 

3 m 2 1 
ordinates, viz., — :5 E — м and #4 Zi 78 (s Pa Б =.) 
Graphing of the Gram-Charlier ordinates is facilitated by use of 
tables of the normal ordinate [called Фо(2:/8)], the normal ordinate 


3 
times (ё = =) [called фз(х/в)], and the normal ordinate times 
2 


2 4 
(s == a “Р =) [called ¢4(z/8)]. These values will be found in 


the Appendix, Table VI. 
To get the ordinates of a particular Gram-Charlier curve it is 
necessary merely to multiply the values given for Фз(2/8) by 


= 5 and the values given for gs (2/8) by 38 ando mdd 


the algebraic sum of these (or to subtract if the sum is negative) 
to the values given for go(x/é).* When the ordinates so com- 
puted are multiplied by №/о, they will become immediately 
comparable with the sample histogram from which the moments 
were computed and the two may be plotted on the same graph. 

! See p. 333 for discussion of the basis for selecting n. 

* If из and и; have been calculated from grouped data, they must be 
adjusted for Sheppard’s corrections before substituting in these equations 
(see p. 134). 
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ОА grecs the fitting of а Gram-Charlier curve and the 
icm aye its ordinates, consider the distribution of weights 
Eo diate rinceton freshmen, shown in Fig. 42b. The mean of 
MR ribution is 151.8 pounds, its standard deviation is 17.8 
GA m and the values of us, из, and ps for this distribution are 

18.094, из = 3,566.48, and ш = 472,727, Sheppard's cor- 


80 
10 
60 
с 
Ф 
Е 50 
2? 
& w Pearsorian curve 
5 40| Table 20, Column (i 70) 
© \ 
5 | \ 
E 30 \ Histogram ordinates 
3 „/ Table 18, Column (t1) 
20} 
--Gram-Charlier curve 
lol \ Table 18, Column (10) 


жы i 
105 125 M5 165 
Me Weight in pounds 
3. 42b.—A Gram-Charlier curve and а Pearsonian curve fitted to the distribu- 
tion of weights of 300 Princeton freshmen. 
allcases. The equation for 


Төс}, ? 
Ctions for grouping being applied in 
s distribution is thus 


€ Gram-Charlier curve that fits thi 


r- [h onem 
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The steps taken to obtain the ordinates of this curve at the 
mid-points of various class intervals are indicated in Table 18. 
Thus in column (1) are written the values of the mid-points of 
the intervals, in column (2) their deviation from the mean of the 
distribution, and in column (3) the measure of these deviations 
in standard deviation units. In columns (4), (5), and (6) are 
entered the values of ew(r/8), өз(л/6), and ¢a(x/6) given in 
Table VI of the Appendix! for the values of 1/6 listed in column 


(8). In column (7) isentered the product of —из/бо% (= —.1048) 
— 3u$ 
times column (5) and in column (8) the produet of "m 


(7.0697) times column (6). Column (9) contains the algebraic 
sums of the items of columns (4), (7), and (8), and in column (10) 
these sums are multiplied by Ni/c = 168.2. These final figures 
are the ones in plotted Fig. 42b. Column (11) contains the 
ordinates of the histogram. 


TABLE 18.—Compuration OF THE OrpINATES oF THE Gram-CHARLIER 
Curve Tuar Firs THE Distrisution or Wercurs or 300 PRINCETON 


FRESHMEN 
(1) (2) (1 (494 | 65) (0) (7) (8) (9) qo | an 
ox Histo- 
T т г a ZN|—ui |ui — Зи: 4 7 gran 
x g-x-4 $ (2) (2) (2 go het ДЕП | cava’ || Bae 
ordi- | nates 
nates 
105 | —46.8 |—2.62| .0129 |--.1305|4- .1152|—.0137| +0080 4.00722 | 1.21 2 
115 | —36.8 |—2.06) .0478 |+.1225|— .5120|—.0128| —.0148 .0202 | 3.40 5 
1% | —26.8 |—1.50 .1205 |—.1457|— .7043|-+-.0153| —.0491 -0957 | 16.10 | 20 
185 | —16.8 |- .04] .2565 |—.5102|— `3901|-+-.0535| — 0272 -2828 | 47.57 | 52 
M5 | — 6.8 | .38| .3712 |—.4028|-- .7990|-L.0429 +.0557 -4691 | 78.90 | 61 
155 3.2 18] .3024 |-F.2097|4-1.1017|—.0220| +-.0768 A472 | 75.22 | 73 
165 13.2 74| .3034 |--.5506|-- .0043|—.0577| + 0003 -2460 | 41.38 | 47 
175 23.2 1.30] .1714 |2-.2018|— .7341—.0306| — 0512 -0896 | 15.09 | 25 
185 33.2 1.86] .0707 |—.0605|— .4005/4-.0063| — 0255 -0485 | 8.16 6 
195 43.2 2.42) „0214 |—.1475|-- .0461|--.0154| -+ 0032 .0400 | 6.73 5 
205 53.2 2.99] .0046 |—.0811|-- .1337|--.0085| --.о003 0224 | 3.77 2 
215 63.2 3.54) .0008 |—.0250/4- .0643 2-.0027| +0045 .0080 | 1.35 2 
М М 


EL ша — Зи _ №: _ 
& = —-1048 Hot = 007 — T = 168.2 


The x? Test. The x? test of goodness of fit of a Gram-Charlier 
curve is the same as the x? test for a normal curve. Frequencies 


1 It is to be noted that for negative values of 2/8 the signs of ¢5(x/é) entries 
in the table must be reversed. 
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of curve and histogram are compared interval by interval, and 
F-—fy. А 

the value Ў, т» is computed and compared with the .05 


point of a x? table. The procedure will be illustrated with 
reference to the distribution of weights discussed in the previous 
Section. 

Before the procedure itself is described, a brief explanation of 
how to compute relative frequencies or probabilities (areas) for 
а Gram-Charlier curve is in order. To calculate the area under 
а Gram-Charlier curve for a given range of 2/68 values it is neces- 
Sary first to compute the normal area and then adjust for the 
effects of the ga(x/é) and (2/8) terms. 'The adjustment 
required by the gs(2/é) term is given by the product of three 


р, z? 
factors, 3 — 1, the ordinate of the normal curve for x/é, and 


~ua/6ê. The adjustment required by the фа(х/в) term is 


Simply фз (©) times د‎ Thus to get the area under a 
в write down the area given 
able gives the area between 
— c to х/6, subtract 


add it to .5 if 2/6 is 


Gram-Charlier curve from — « to х/ 
by Table VI of the Appendix [this t 
the mean 0 and 2/8; to find the area from 
the table area from .5 if 2/9 is negative, or 


Positive], and add to it algebraically the value of (= = 1 


ATZ a) | (ua = Bu). т 
[^ (5] (==) plus the value of le (3 (e). To 


Calculate the area between any two values of x/é, calculate the 
and take the difference. 


areas from — œ to each of these values, 

Table 19 illustrates the procedure just outlined by showing 
OW the areas under the Gram-Charlier curve fitted to the 
Weights of the 300 Princeton freshmen are determined for various 
Class intervals. ‘Thus columns (1) to (3) convert the original 
Class limits into x/é deviations from the mean. Column (4) 
Blves the areas under the normal eurve from — « to the upper 
imit of each class interval, now measured in x/éunits. Columns 


(5), (6), (7), and (8) show the computation of the adjustment 
“equired in each ease by the фз(2/9) term [the final adjustment 
d columns (9) and (10) show 


€re is entered in column (8)], 22 | 
€ computation of the adjustment required by the ex 8) term 
па] adjustment given in column (10). Column (11) is the sum 
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of the items of columns (4), (8), and (10) and represents the final 
areas from — = to the given values of 2/8. In column (12) the 
areas for the individual class intervals are computed by taking 
the successive differences between the items in column (11). 
These figures give the proportion of the total area contained in 
each interval. As indicated in the headings of the table, the 
sample values of us, из, and д; are taken to be the parameter 
values, so the areas computed refer to the fitted curve. 

'The remaining part of Table 19 is concerned with testing the 
goodness of fit of the curve to the given histogram. Thus in 
column (13) the relative frequencies given in column (12) (areas 
under a frequency curve, it will be recalled, measure relative 
frequencies) are multiplied by М = 300 to give absolute fre- 
quencies that are comparable with the frequencies of the sample 
histogram. Then columns (14) to (17) carry out the calculations 


necessary to compute the value of Ў, eu This is found 


to be 11.6075. A x? table shows that for n = 10 — 5 (i.e., the 
number of class intervals! minus 5) the probability of as great & 
value as 11.6675 is between .02 and .05, which does not indicate à 
very good fit. Apparently the Gram-Charlier type curve is not 
flexible enough in this instance to adjust itself to the sharp 
contours of the histogram. It is possible that a Pearsonian type 
curve might give better results; this will be examined in the 
next section. 

A Pearsonian Curve. Graphic comparison and the x? test 
can be used to test the goodness of fit of a Pearsonian curve 25 
well as a normal or Gram-Charlier curve. The following sec 
tion will thus be devoted,to the problems peculiar to a Pearson- 
ian curve and avoid as far as possible duplication of previous 
discussion. 

Finding the Curve Equation. The Pearsonian equation (9), 
as noted above, is not an equation for the frequency curve itself, 
but one for its relative slope.2 The former can be readily 
obtained from the latter, however, by the use of the integral 
calculus. Although it is possible in any practical case to sub- 
stitute the computed values of the moments in Eq. (9) and ther 
find the frequency curve to which it gives rise, it is generally 

1 See р. 333 for a fuller discussion of the basis for selecting n. 

2 See р. 57. 
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easier to carry out the transformation (integration) first and then 
substitute the values of the moments in the resulting equation. 
The details of the process of integration and the development 
of eurve equations for the various types of eurves is explained in 
detail in W. P. Elderton, Frequency Curves and Correlation. 
Their use will again be illustrated by reference to the data on the 
weights of 300 Princeton freshmen. The ws and fs for these 
data are из = 318.094, из = 3,566.48, из = 472,727, Bı = .4063, 
and 8» = 4.6720. On substitution in Eq. (12), it is found that 
the criterion of curve type x equals .161, which indicates a type 
IV curve. This is also indicated by Fig. 41. 

If the integration is carried out, it will be found that the 
formula for a type IV curve is! 


INT ши = 
у-и (1+2) ee (12) 
where 
982—811) 
28 = 381—76 
т = 4(r 2) 


E rr 2) 8 
Мб — 1) — Bir = 2° 
а= Ata VC — 1) — Bir — 2 


"WS 
Уо = aF(ry) 


F(r,v) is a special function of r and v 
and the origin is at the mean plusva/r. An alternative equation 
is? 
y = yo cost? Qe? (13) 


P 
where 6 = tan-! = and 0 in e’ is measured in radians. Тһе 


function (тур) may be evaluated for various values of r and ? 
from Karl Pearson’s Tables for Statisticians and Biometricians; 
Table LIV, explained on pages lxxxi to Ixxxiii. 

Equation (13) is the easier form to employ in plotting ordinates 
although it does not give ordinates at equal z intervals. If the 

1 See ELDERTON, op. cit., p. 64. 

2 See ibid., p. 65. 
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latter is essential, as it might be if the ordinates were to be used 
to compute areas, then Eq. (12) must be used. Equation (12) 
will be employed here. The calculations required for determin- 
mg the curve equations are as follows: 

Since 8, = .4063 and В» = 4.6720, then, on substituting these 
sample values for the population parameters which they serve to 
estimate, it follows that 


6(4.6720 — .4063 — LS 
2(4.6720) — 3(4003) — 6 — 92205 
4(9.2204 + 2) = 5.6102 
E 9.2204(9.2204 — 2) \/.4063 

\/16(9.2204 — 1) — 4003 (9.2204 — 2) 

= 4.0398 
а = NULL A/16(9.:2204 — 1) — .1063(9.2204 — 2j 
46.8374 

To find the value of F(r,v) to be used in computing Yo, resort 
ud be had to Karl Pearson's Tables for Statisticians, Table 

TV. The first step is to find e, defined by the relationship 


tan o = v/r, For the given data, 


4.0398 v о 
25504 = 43814 and е = 23.6005 
For ¢ = 23° and т = 9, Pearson's Tables give log (тр) = 
2186422 — 10; for e = 20, 7 = 9 log F(r,)) = 9.2488237 
— 10. Hence, for e = 23.0005" and r = 9, the value of log F 
(^) is, by straight-line interpolation, 9.2385650 — 10. Again, 
Or o = 23° and r= 10, log E) = 9.2347504 — 10; for 
¢ = 24° and г = 10, log F(7) = 9 2686087 — 10. Interpola- 
tion gives, for p = 23.6605? and г = 10, log F (rv) = 9.2571008 
3 10. Finally, interpolation between 9.238565 — 10 and 
y 2571003 — 10 gives, for e — 23.6605? and r — 9.2209, log 
(rw) = 9.2426502 — 10 = —-7573498. Consequently, 


log y, = log 300 — log 46.8374 — (—.7573498) = 1.5038783, 
and y, = 36.63. The formula for the ¢ 


—5.610 - z^ 
12 | gee lige aa 


y = 36.63 L + (46.84)? 


m 


Ш 


tan e = 


urve is thus 
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du opus, 4.0398 И 
the origin being at 9.2904 (46.8374) plus the mean or at 
20.5 + 151.8 = 172.3. 


TABLE 
ABLE 21.— COMPUTATION OF AREAS FROM ОвртхаТЕЗ оғ TABLE 20 


= y Ау Area ог fi Я 

Area ог frequency 
E 

.0 77+ .2 

9 17.3+ .3 

7 19 = 24. 35.24 .8 

"1|39.9 — &( 24.7 — 14.7) = 59.9. — 4 

8|74.6 — 2x 14.7 + 14.8) = 74.6 — 1.2 

3 39.8 — 2:(—14.8 + 31.8) = 59.8 — d 

1| 28.5 — 4(—31.3 + 20.1) = 28.5 + 5 

84 —a(-20.1-- 6:0 = 8-4 .6 

s| 18-—4(- 9.6 1.5) = 48 4 

s| see biet aye Бер zr 


* 
А А n 
rens of these intervals are taken equal to ordinates. 


Тезт or GOODNESS OF Fir o: 


F THE CURVE 


TABLE 22.—Tux x? 


c " i. ung | КЕ 
x x F f | Boe ®-Л 7 
ee ee РЕ 
Below 110 | в5| —4.5| 20.25 3.12 
110- mE 6.5 , ё * 
ls pee 5| 7.9 | -29| 8:41 1.06 
130. 20| 17.6 2.4| 5.76 .33 
140- 52 | 35.5 16.5 | 272.25 | 7.67 
те 61 | 59.5 1.5 2.25 .38 
im 73 | 73-4 -.4 16 
у че 47 | 59.1 | —12.1 | 146-41 | 2.48 
ci 25 | 2010 | —40| 16:00 ‚55 
da 15| 11.5 35| 12.25 | 1.06 
Sum 300 | 300 16.65 


it being assumed that the area above 220 is zero. 
1 of comparison is at least equal to 5. 
ation the ordinates of 


s indicated in Table 20. 


R 
uen to make total area equal to 300, 
nsolidated so that total area for interva 


G В d а 
"raphic Comparison. From this equ 


з, frequency curve y may be computed a. J Е 

st Will be noted that the д’ values are taken as the mid-points 

in the various class intervals. The final results are to be found 

is column (10); the ordinates of the histogram to which the curve 

‚$ fitted are given in column (11) for the sake of comparison. 
€ curve and histogram are graphed in Figure 42b. 
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The x? Test of Goodness of Fit. То test the goodness of fit the 
areas under the curve for the various class intervals may be com- 
puted from the quadrature equation (4) or (5).* "This has been 
done in Table 21 by using Eq. (4). The final results are com- 
pared with the histogram frequencies in Table 22 and a x? test 
of goodness of fit carried out. "This table yields 


@ =p? _ 
> TT = 16.14. 


For n = 9 — 5 = 4,1 the .01 point of a x? distribution is 13.277, 
which indicates that for the case in question the probability of а 


= 2 
value of P LU equal to or greater than 106.65 is less than 


.01. Hence the fit is not an especially good one. Apparently, 
owing to the sharp peak in the center, the given data cannot be 
well fitted by a smooth frequency curve. 


* See p. 128. 
1 For explanation, see p. 333. 


PART II 
Elementary Theory of Random Sampling 


CHAPTER VIIT 
A PREVIEW OF SAMPLING THEORY 


An important part of statistical theory is concerned with the 
logical basis of inferences about a population, t.e., about a large 
Set of cases, from which a sample has been taken. This is called 
the “theory of sampling.” 
, In many instances a stu 
ticable, if not impossible. 

of the white race goes back tot 


dy of the whole population is imprac- 
The set of children born to members 
he dim past and, so far as we know, 


will continue into the unknown future. For all practical pur- 
Poses it is an infinite population of children. Any study of 
this population, for example, & study of the percentage of male 
and female births, must be made from a sample. Again, the 
registered voters in the United States form a population of many 
millions. If an institute of public opinion wishes to discover 
the trend of political sentiment in the country, it might con- 
сауаЫу approach every one of these millions of voters and ask 
him what party or candidate he favors. Such a “straw vote,” 
however, would necessitate considerable preliminary preparation 
and would be very costly, and the tabulation of returns would be 
ап elaborate clerical and statistical problem; only the United 
States Bureau of the Census could venture to undertake such a 
Job, A small private institute must necessarily determine public 
opinion from a carefully selected sample. 
For similar reasons, sampling is employed in many other fields. 
Tt is used to study qualities of manufactured products, yields of 
Various agricultural techniques, results of different medical 
treatments, effects of suggested educational methods, and the 
like, Sampling analysis is not confined to human populations 
ut may be applied to the study of populations of inanimate 
153 


154 ELEMENTARY THEORY OF RANDOM SAMPLING 


objects, plants, and animals; it has general applicability. Statis- 
ticians often speak of a “universe” instead of a “population”; 
the two words are interchangeable in sampling analysis. 


RANDOM SAMPLING 


Random Sampling and Probability. If а sample of data is 
obtained in а manner that may be characterized as “random,” 
it is possible to make certain inferences about the population 
from which the sample has been drawn. With respect to a 
random sample the calculus of probability and the theory of 
frequency curves developed in previous chapters may be applied 
with a reasonable degree of accuracy. In random sampling, 
Tor example, it is possible to compute the probability of wrong- 
fully rejecting a given hypothesis regarding the population. It 
is also possible to calculate ranges of values that may be stated 
to cover the actual population values with а given probability.’ 
The randomness of a sample accordingly affords an essential basis 
upon which inferences regarding the population may be logically 
based. 

Sampling that is purposive and not random will be discussed 
briefly at the end of this chapter. Samples obtained by this 
method may be "thought" to be £ood representations of the 
population, but just how good is indeterminate; nor can it be 
determined by the use of this method how often incorrect 
inferences regarding the population may occur. Random 
sampling is the only method so far devised that permits logical 
inferences about a given population.? 

Types of Populations. Before diseussing the technique of 
random sampling it is desirable to distinguish several different 
types of populations. A population may be either “existent” 
or “hypothetical.” The registered voters in the United States, 
the stock of machine parts in a given storeroom, the wool sheep 
on a given ranch are all existent populations. The set of heads 


1 Testing of hypotheses and determination of “confidence intervals” are 
discussed briefly in a subsequent section of this chapter (see pp. 163-174) 
and developed more fully in the chapters that follow. 

* It must be admitted, however, that confidence in an inference based one 
random sample is dependent on the "thought" or firm belief that it is & 
truly random one. Whether thought with respect to randomness is any 
sounder, asa basis for inferences, than thought with respect to representative 
ness of a sample obtained by some other method is a debatable question. 
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and tails to be obtained by the indefinite tossing of a given coin, 
the children that have been and will be born to members of the 
White race, the results of the repeated performance of a given 
physical, biological, or other type of scientific experiment are all 
hypothetical populations. In some cases an existent population 
is part of a larger hypothetical population; for example, the 
machine parts in а storeroom, which of themselves form an 
existent population, may be viewed as а portion of the hypo- 
thetical population of parts that would be produced by the 
indefinite continuation of the given manufacturing process. 
Again, hypothetical populations may sometimes be the products 
of random selection from given existent populations. The balls 
form an existent population, but the balls 


ma bag, for exam ле, 
bag with replacements would 


that might be drawn from the 
form a hypothetical population. 

Populations may also be classed as “finite” or “infinite.” 
The body of registered voters in the United States is a finite 
Population, although it is so large that for some purposes it may 
be considered infinite. The set of heads and tails obtained by 
the endless tossing of a coin, the continuous births of white 
children, the results of an indefinite repetition of any physical 
9r biological process—all constitute infinite populations. It 
18 possible, of course, for & hypothetical population to be finite 
or infinite. The first 100 heads and tails to be obtained from 
the tossing of a coin is a finite hypothetical population. Existent 
Populations, however, are almost universally finite.! 

Technique of Random Sampling. Аз indicated in the chapter 
On probability, randomness is largely a matter of intuition. 
Probability theory considers the set of all possible different 
Samples and derives their distribution according to some criterion. 

f probability theory is to be used in predicting the results of any 
method of sampling, the method should be such that, if repeated 
a large number of times, it will tend to yie 
Samples with equal frequency. Such a method is called a 


‹ 
random” method. 


of populations follows closely that of G. 
Introduction to the Theory of Statistics, рр. 
В. Babington Smith, “Randomness 
urnal of the Royal Statistical Society, 


1 The foregoing classification 
ge Yule and M. G. Kendall, 
К) -334. Also, see М. G. Kendall and 
and Random Sampling Numbers,” Jor 

Ol. 101 (1938), pp. 147-166. 
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General notions regarding the concept of probability suggest 
that, if every member of the population is given an equal chance 
of being selected, or, to put it another way, if the selection of any 
member of the population is independent of the attribute it 
assumes, the results may be those desired. "There is no definite 
assurance, however, that any special technique will conform to 
these criteria. All that can be done, and all that has been done, 
is to apply to à known population some apparently random 
method and to compare the results obtained with those expected 
upon the basis of probability theory. If the actual and theoret- 
ical results agree reasonably well when applied to the known 
population, it is concluded that the given method will also yield 
random samples when applied to an unknown population. The 
danger exists, nevertheless, that a method of selection yielding 
random samples in one instance may not give random samples 
in another instance. In the final analysis, belief that a particular 
method will produce random results rests on intuition guided 
by past experience. Some of the methods that have been 
devised to obtain random samples are ordinal selection, mechan- 
ical randomizing devices, tables of numbers, random sampling 
numbers, and natural selection. 

Ordinal Selection. The methods of selection described in this 
and the next three sections are methods devised to obtain random 
samples from finite populations. They are frequently used in 
the social sciences, since the sampled populations in these fields 
are often finite. 

If à given population consists of a list of objects in which the : 
attribute of an object is independent of its position on the list, 
members of the population selected by some ordinal position 
(every tenth, every twentieth, or every tenth position per page) 
will be a random sample for the purpose of studying the given 
attribute. For example, a list of students alphabetically 
arranged in a college directory would presumably be an arrange- 
ment independent of student heights. Accordingly, some 
ordinal selection, say the tenth and twentieth name on each 
page, would give a random sample for the purpose of studying 
student heights. 

The method of ordinal selection must be used with care. If ® 
reporter of a college newspaper visits every tenth room in a given 
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dormitory and if the dormitory is so arranged that each entry has 
only 10 rooms, it might happen that he would visit only a first- 
floor room in each entry. If he were seeking data of an economic 
or social character, the preferred location of the rooms visited 
might give a definite bias to the sample obtained. This would 
be a case in which the method of selection failed to be independent 
of the attributes of the members of the population in which the 
reporter was interested. 

In the use of ordinal selection, care must also be taken to 
note whether the list from which selection is made comprises the 
Whole population being studied or is merely assumed to be repre- 
Sentative of that population. The list of telephone subscribers 
In the city of New York might be taken, for example, to be 
representative of the whole adult population of the city. For 
Some purposes, however, this would be a risky assumption. For 
Very poor families cannot afford telephones, and any study of 
the social characteristics of the population would be biased unless 
it included representatives of this poorer section. The Liverary 
Digest poll in 1936 failed accurately to predict the outcome of 


the presidential election because it relied heavily upon lists of 
f the whole population with 


names that were not representative о 
respect to attributes affecting their votes. 

Mechanical Randomizing Devices. Random samples are often 
Sought by the use of various is randomizing” devices. Suppose, 
for example, that each member of а finite population is identified 

Yanumber. If this number is written on a small piece of paper 
and inserted in a small metal cylinder and if these cylinders are 
put in a revolving drum and thoroughly mixed, the selection of a 
Sample of cylinders from the drum might possibly be taken to be 
& random sample from the given population. This type of 
randomizing device is used in many lotteries. А 

The shuffling of cards is another randomizing device. If the 
Population is small enough, the numbers representing its various 
Members may be written on cards and these may then be thor- 
oughly mixed by shuffling. The withdrawal of a given number 
of cards from the shuffled pack may be taken аа а random sample 
from the given population. Such а sample may be taken without 
replacing any of the cards; or the number of each card may be 
recorded, the card replaced, and the pack reshuffled before the 
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next card is drawn. Both these would constitute random 
samples, but the probability of a given type of sample would be 
different under the two procedures.! 

Similar devices make use of dice, roulette wheels, and other 
gambling instruments. Although a random sample may possi- 
bly be procured by such means, hidden bias in the mechanical 
device or in the procedure must be carefully avoided. If lottery 
cylinders are withdrawn by hand, the person making the with- 
drawals may have an unconscious predilection toward the 
cylinders of a certain shape, smoothness, or texture. If numbers 
are written on cards with ink, the difference in the size of the 
number might affect the stickiness of the cards and cause some 
to be withdrawn less often than others. The use of dice notori- 
ously results in bias; Karl Pearson's study of the gaming results 
at Monte Carlo indicates that the odds against the absence of 
bias are extremely large.? If the possibility of deliberate falsi- 
fication is precluded, the bias in the Monte Carlo results ‘would 
appear to arise from small imperfections in the roulette wheel 
which direct the ball into some compartments in preference to 
others." For these reasons, mechanical randomizing devices 
of this kind are no longer in high favor. 

Tables of Numbers. When it is possible to associate a number 
with each member of the population, tables of numbers such аз 
tables of squares and other powers, tables of logarithms, various 
statistical tables, and even tables of telephone numbers are some- 
times employed in the attempt to secure a random sample. For 
example, if a population consists of 100 members and a number 
from 0 to 99 is associated with each member of the population, à 
statistical table involving seven-place figures, say, might be 
opened to any arbitrarily selected page and the last two digits of 
each of the first 10 lines of the table read off. The members of 
the population whose numbers correspond to the 10 pairs of two 
digits obtained in this way (09, 01, 07, ete., being read 9, 1, т, 
etc.) might be considered to be a random sample of 10 from the 
given population. 


1 For further discussion of these two cases, see Chap. IX, рр. 186-190, 
209-211. 

? Chances of Death, Vol. I, Edward Arnold, London and New York (1897); 
рр. 42-62. 

3 KENDALL and BaBINGTON Ѕмітн, ор. cit., p. 156. 
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The success of this method of obtaining a random sample 
depends on the digits in the table occurring in a random order. 
Unfortunately in many tables of numbers, such as tables of 
logarithms, a relationship exists between the figures in successive 
rows. Such tables, of course, eannot be used to draw a random 
sample. Even tables that are apparently random in character 
must be used with caution. Kendall and Babington Smith give 
the following account of an experiment with telephone numbers:! 
“We have attempted to construct а random series by selecting 
digits from the London Telephone Directory. In order to 
exclude bias as far as possible, pages were taken by opening 
the book haphazardly; numbers of less than four digits were 
ignored; numbers associated with names printed in heavy type 
were also ignored; and only the two right-hand digits were taken. 

“Tt was found that a series of this kind was significantly biased. 
There appeared a deficiency of fives and nines. ... 

“The reasons for this effect are complicated and are not confined 
to the obvious one that telephone engineers would avoid fives 
and nines because of their assonance. It thus appears that the 
London Directory is useless as а source of random digits.” 

Random Sampling Numbers. Because ordinary tables of num- 
bers may fail to yield random sets of digits, attempts have been 
made to construct special tables of random sampling numbers 
that may be used with some confidence. One such table is 
Tippett’s Random Sampling Numbers. This consists of digits 
Picked at random from British census reports. The digits are 
arranged in groups of four so as to provide 26 pages of 400 four- 

gured numbers, or a total of 10,400 four-figured numbers. 
These figures have been subjected to a number of different tests 
of randomness. Generally the results were deemed satisfactory, 
although one ‘investigator found that the figures were somewhat 
“patchy” in meeting the tests. The table and a description 
9f the various methods of using it have been published by the 
Department of Applied Statistics, University of London, Uni- 
versity College, as Tract XV of its Tracts for Computers. 

Another set of 5,000 random sampling numbers has been 
Published in the Journal of the Royal Statistical Society by 


1 * 

, bid., рр. 156-157. 

: Bee Yure, G. Upny, “А Test of Tipp 
Unal of the Royal Statistical Society, Vo 


ett’s Random Sampling Numbers,” 
1. 101 (1938), pp- 167-172. 
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М. G. Kendall and B. Babington Smith." These were obtained 
from a special randomizing machine, which its inventors describe 
as follows:? “Essentially the machine consists of a disk divided 
into 10 equal sections, on which the digits 0 to 9 are inscribed. 
The disk rotates rapidly at a speed which can, if necessary, be 
made constant to a high degree of approximation by means of a 
tuning fork. The experiment is conducted in a dark room, and 
the disk is illuminated from time to time by an electric spark or 
by a flash of a neon lamp, which is of such short duration that the 
disk appears to be at rest. At each flash a number is chosen 
from the apparently stationary disk by means of a pointer fixed 
in space. 

“Tn the actual experiment, the disk was rotated by an electric 
motor at about 250 revolutions per minute. It was illuminated 
by a neon lamp in parallel with a condenser in an independent 
electric circuit which was broken by means of a key. Owing to 
experimental conditions, the time between the making of the 
circuit and the passing of the flash varied, but to add an extra 
element of randomness the key was tapped irregularly by the 
experimenter. Flashes occurred, on the average, about once in 
3 or 4 seconds.” 

These random sampling numbers of Kendall and Babington 
Smith have also been subjected to various tests of randomness 
with satisfactory results. 

To draw a random sample, tables of random sampling num- 
bers are used according to the procedure already explained in 
connection with other tables of numbers. After the members 
of the population are assigned numbers, a sample of size N is 
picked by selecting any set of N numbers from the table of 
random sampling numbers. Suppose, for example, that а 
sample of 100 from a population of 10,000 commercial banksy 
doing business in a given area is desired. The banks are, it 
happens, listed in a directory. The first bank in the directory 
can thus conveniently be numbered 0 and the last 9999; and 
then Tippett’s tables can be opened to any page and the numbers 
in the first column read off. The banks whose numbers corre- 
spond to the numbers so selected will then constitute a random 
sample of 100 from the whole population of 10,000 banks. 

1 Op. cit., pp. 164-166, 

2 Ibid., pp. 157, 
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Instead of taking the numbers from the columns of the table, 
they could have been selected by rows or even by diagonals. In 
each case, a random sample will be obtained. On the assumption 
that Tippett's numbers are à random set, the selection of any 
subset of these numbers in а way that is independent of the 
numbers themselves will yield а random sample. Kendall's 
and Babington Smith's numbers can be used in the same way. 

Natural Selection. The methods of sampling described above 
are of use primarily in sampling from finite existent populations 
For the most part they cannot be used to obtain random samples 
from an unknown hypothetical infinite population. А popula- 
tion that would be hypothetically generated by the continuous 
operation of some physieal, biological, or social process is an 
unknown hypothetical infinite population. When these proc- 
esses occur in everyday life, any existent results of that process 
may be tentatively taken as à random sample of the infinite 
population of results. The selection here is а “natural” one in 
that it is effected by the ordinary forces of everyday life. 

The role of the investigator in the case of hypothetical infinite 
populations is to study the circumstances under which the natural 
selection of the sample was effected in order to see whether some 
unusual influences might have given а special bias to this par- 
ticular sample and consequently might have destroyed its natu- 
rally random character. Thus any set of white birth statistics 
might tentatively be taken аз à random sample of the population 
of white births. If, however, the sex ratio were being investi- 
gated and it were known that geographical location had a sig- 


nificant effect on the ratio of male to female births, then data on 


white births in a given area could be looked upon as a random 
area and not of births in all 


sample only of white births in that 


areas. 
Naturally occurring data thus provide their own method of 
hether the process giving rise to them 


Selection. This is true W | 1 
is one of everyday life or а process carried out in some experi- 
mental laboratory or research station. In testing the effects ofa 
Certain serum, care may be taken to see that the guinea pigs 


1 When the population is known, however, and sampling is undertaken 
Primarily for experimental purposes, random sampling numbers may be 
used to obtain random samples from infinite populations. See YULE, 
С. Upxv, and М. С. KENDALL, ор. cit., рр. 343-344. 
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used for the experiment are not noticeably abnormal, but 
otherwise the peculiar characteristics of each pig are those that 
chance happens to provide. Again, in agricultural experiments 
with fertilizer, the peculiar influences that affect each plot are 
those that are provided by the chance variations in soil, wind, 
rain, sunshine, and other natural forces. 

In experimental research, however, care must be taken to 
design an experiment so that the balance of natural forces is not 
all to the one side or to the other for certain parts of the data. 
Thus, if the difference in rate of growth of self-fertilized and cross- 
fertilized plants is being studied, and if the cross-fertilized plants 
are all placed on the sunny side of the experimental plot, the 
effect of the sun and any possible effect of the difference in 
fertilization will be so confounded that it will be difficult if not 
impossible to determine whether any significant difference in the 
rate of growth is due to the one or to the other. If an equal 
number of plants of both types are assigned at random to the 
two sides of the plot, however, then the constant effects of the sun 
will be eliminated. With this arrangement or design of the 
experiment, the possibility that the recorded difference in rate 
of growth may be due to chance instead of to a difference in 
method of fertilization may be determined by probability theory.’ 
It is the role of the experimenter in these cases so to design his 
experiment that natural forces do provide him with a random 
sample. 
| In the case of very large finite populations, natural selection 
is also sometimes used to get a random sample. Ап organization 
investigating publie opinion, for example, may send an agent 
to a given locality where the first dozen people he meets, of the 
sort whose opinion is desired, may be taken as a random sample 
of the given class of people. The sample is thus picked for him 
by chance forces. For the 1940 United States census, schedule 
were printed so that those persons whose names happened to 
fall on certain lines of each schedule were asked supplementary 
questions. Two out of the forty lines on each side of a schedule 


were marked off for this purpose so that a 5 per cent sample was 
obtained on the supplementary questions.? 


* Cf. FISHER, В. A., The Design of Experiments, Chap. ТЇЇ. 

2 See STEPHAN, F. F., W. E. Demrne, and M. H. Hansen, “The Sampling 
Procedure of the 1940 Population Census," Journal of the ава Statisti- 
cal Association, Vol. 35 (1940), pp. 615-630. 
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SAMPLING AND THE THEORY OF STATISTICAL INFERENCE 


If a sample is drawn from an unknown population in a random 
manner, certain inferences about the population may be made 
on the basis of probability theory. The following sections will 
discuss in a general way the testing of particular hypotheses 
regarding a population and the estimation of population param- 
eters. The application of this general theory to special problems 
will be discussed in detail in subsequent chapters. 

Testing Hypotheses Regarding the Population. Often & 
problem presents itself in such а way that à definite hypothesis 
regarding the population is offered for testing. A fruit merchant 
about to buy a carload of oranges wishes them to be of good 
quality; he does not want to buy the carload, let us say, if more 
than 10 per cent of the oranges are substandard. Accordingly, 
he will want to test the hypothesis that the carload is 10 per cent, 
substandard. If this hypothesis is rejected, because à random 
sample suggests that the number of substandard oranges is more 


than 10 per cent, he will not buy the carload. But if the hypo- 


thesis is not rejected, because the random sample suggests that 
ter than 10 per 


the number of substandard oranges is not grea 
cent, he will buy the oranges. i ; 
A similar problem is illustrated by the manufacturer of electric- 
light bulbs who does not wish to adopt & proposed new process 
of production unless the mean length of life of the bulbs produci- 
ble by it is 1,000 kilowatt-hours or more. He knows from past 
experience that electric-light bulbs vary in length of life in 
accordance with the normal frequency curve. He will thus want 
to test the hypothesis that the population of bulbs producible 
by the new process is а normal population with a mean of 1,000 
kilowatt-hours. If this hypothesis is rejected, because a random 
Sample suggests that the mean length of life 1s less than 1,000 
<ilowatt-hours, the manufacturer will not adopt the new process. 
But if the hypothesis is not rejected, because the random sample 
Suggests that the mean length of life is not less than 1,000 kilo- 
Watt-hours, he will adopt the new process. T hese are examples 
^ Whieh the problem itself suggests & particular hypothesis to 
зе tested and alternatives to it- З 
Errors in к Hypotheses. In testing à particular d 

: esis regarding a population, two kinds of errors way tots = 
The first of these errors, hereafter referred to as ` error I,” is the 
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rejection of the hypothesis when it is actually true. That 14 
the hypothesis will be deemed unreasonable on the basis of the 
sample obtained, although in fact the population is exactly as 
the hypothesis assumes. The second kind of error, which will 
be referred to as “error IT," is the failure to reject the hypothesis, 
i.e., the failure to consider it unreasonable on the basis of the 
given sample, when in fact the population is not the same as the 
hypothesis assumes. The aim of the statistical testing of 
hypotheses is twofold: it seeks on the one hand to limit the risk 
of the first kind of error to à preassigned amount and, on the 
other hand, to minimize the risk of the second kind of error. я 

Procedure for Limiting Risk of Error I. To limit the risk of 
falsely rejecting a given hypothesis to a pre 
following procedure is adopted: 

First, the hypothesis to be tested is assumed to be true. 

Second, a certain Statistic is selected such “as a mean ог 
standard deviation, and probability theory is employed to show 
how this statistic might be expected to vary from sample to 
sample on the assumption that the hypothesis is true. The 
process consists in supposing that a large number of samples of 
given size have been drawn at random f rom the assumed popula- 
tion and then finding the frequency distribution of the sample 
values of the selected Statistic. This distribution of sample 
values is called the “sampling distribution ” of the statistic, and 
the standard deviation of this distribution is called the “standard 
error” of the statistic. The argument here is purely theoretical 
obability calculus. у 
ecify the degree of risk that the investi- 
n rejecting the given hypothesis when it 
risk is called the “coefficient of risk." 
Study the sampling distribution of the 
mark off a range of values for which 
istic falling within it is just equal to the 
‘ange is called the “region of rejection,’ 
alues is called the “region of accept- 
ose that the adopted coefficient of 


assigned amount, the 


is true. This degree of 

The fourth step is to 
selected statistic and to 
the probability of the stat; 
coefficient of risk. This r 
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normal frequency distribution, the mean of which is 1,000 and 
the standard deviation of which is 20.! From the properties of 
a normal distribution, it is known that .05 of the frequencies of a 
normal distribution lie below the mean minus 1.6450. Hence the 
range of values below 1,000 — 1.645(20) = 967.1 could be taken 
as a .05 region of rejection. The range of values from 967.1 
upward would constitute the corresponding region of acceptance 
(see Fig. 43). 

The final step is to note whether the value of the selected 
statistic for the given sample falls in the region of rejection or the 


Fig. 43,—An .05 limit in the lower tail of a normal sampling distribution. 
region of acceptance. For example, if the sampling distribution 
and the regions of rejection and acceptance were as indicated 
above, and if the mean of the given sample were 965.2, the given 
hypothesis would be rejected because this sample value would 
fall in the range below 967.1. If, on the other hand, the sample 
Mean had been 972.3, then the hypothesis would have been 
Accepted, for the sample value would then fall in the range 
above 967.1. 

This procedure ensures that the risk 
thesis when it is true will be just 05. 
testing hypotheses, false rejection: 
Cent of the time, since sample values 


tion only that often when hypotheses are true. . 
It will be noted that there are three arbitrary elements in the 
Procedure. One is the selection of the statistic to be used, the 
2 As indicated in Chap. VI, the standard deviation of the sampling д: 
tribution of the mean, i.e., the standard error of the mean, is equal to the 
х d e square root of the size 


Standard deviati ion divided by th 
eviation of the population А DE 
of the sample Кг ае of 25, say; & standard error of 20 implies a 


Population standard deviation of 100. 


of rejecting а given hypo- 
For if it is always followed 
s will be made only 5 per 
will fall in regions of rejec- 
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second is the selection of the coefficient of risk, and the third 
is the selection of the regions of rejection and acceptance. The 
choice of the statistic will depend in part on the ease of calcula- 
tion and in part on the size and position of the regions of rejection 
and acceptance to which it gives rise. The latter consideration 
is of prime importance since, as pointed out below, the size and 
position of the regions of rejection are a major factor in reducing 
the error of accepting a hypothesis when it is not true. This 
will be more fully discussed later. 

The selection of the coefficient of risk will depend in most 
instances upon the nature of the problem. If action based upon 
nonacceptance of a hypothesis is not of great significance, the 
investigator may be willing to run considerable risk of falsely 
rejecting a hypothesis, In testing a certain manufacturing 


ап occasional overhauling of his process when in fact there is no 
real need for such overhauling. In other cases, however, the 
expense of overhauling may be very great, and the manufacturer 
may be willing to undergo an unnecessary overhauling only very 
infrequently and may gladly undertake considerable expense for 


id any such unnecessary overhauling. 
In the former case, the manufacturer might be willing to 


5 where there is no. strong 
reason for adopting a very high or a very low figure. | 

The third arbitrary element, the choice of the regions of rejec- 
tion and acceptance, is illustrated in Fig. 44. For any given 
Statistic and coefficient of risk it is possible to pick innumerable 
regions of rejection and acceptance. In the previous illustration, 
the range of values below 967.1 was taken as the region of rejec- 
tion, and the range of values from 967.1 upward was taken as the 
region of acceptance, The range of values above 1,000 + 1.6450; 
that is, above 1,032.9, could equally well have been used as à 
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DP pa rejection and the range of values from 1,032.9 down- 
ability " e corresponding region of acceptance, for the prob- 
likes а sample mean falling above the mean, plus 1.6450, is 

se .05. This probability of .05 could also have been 


0.05 
Region of 1000 E 
rejection | _ Region of acceptance 1 g 
967.1 
0.05 
1000 Regionof =X 
Region of acceptance |. СРСР x 
1032.9 


0.025 
Region of 100 Region of X 
ion of acceptance Ш, rejection m 


Fio, 44 960.8 
ak —Three combinations of regio 
e the risk of falsely rejecting the 


n and acceptance that will 


ns of rejection 
hypothesis X = 1,000 just. equal to .05. 


taliad by splitting the region of rejection so that it included the 
Ee of values below 1,000 — 1.96e, that is, below 960.8, and 
the range of values above 1000 + 1 обо, that is, above 1,039.2. 
js e corresponding region of acceptance in this ease would have 

en the range of values from 900.8 to 1,039.2, as shown in the 
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third section of Fig. 44.' Still other regions of rejection could 
have been found that would have had a probability of .05. 

Any of these regions would have limited the risk of falsely 
rejecting the given hypothesis to the preassigned coefficient of 
risk, viz., .05. They would not all have been equally good with 
respect to the risk of the second kind of error, viz., that of accept- 
ing the given hypothesis when it is not true. It is with the 
problem of selecting regions of rejection and acceptance that 
will minimize this second kind of error that the ensuing dis- 
cussion is concerned. 

Procedure for Minimizing Risk of Error II. If the population 
is not as specified by a given hypothesis, the probability of a 
sample falling in any region of acceptance selected for limiting 
the risk of error I will depend on the position of the region in 
relation to the actual character of the population. If there 
were one particular region of acceptance for which the probability 
of a sample falling within it was less than that of any other 
region giving the same risk of error I, no matter how the popula- 
tion differed from the given hypothesis, this would obviously 
be the best region that could be selected. For, in this case, the 
probability of accepting the false hypothesis would be less than 
it would be for any other region giving the same risk of error I. 

Such a best critical region is usually impossible to determine. 
Ordinarily, one region wii be the best that can be chosen if the 
population differs from the given hypothesis in one direction, 
be the best if the population differs 
. Tf the statistician is concerned with 
ion differing from the given hypo- 
possible, some compromise region 


The selection of good regions of rejection and acceptance will 
be illustrated by ref. 


erence again to the three diagrams of Fig. 44. 
Suppose that the hypothesis to be tested is that the mean of the 
population from which a sample has been drawn equals 1,000, 
and suppose that probability theory, together with certain facts 
known about the population, indicates that the sampling dis- 
tribution of means of samples from this assumed population will 
be a normal distribution with & mean of 1,000 and a standard 


1 This is the only type of region that was discussed in J. С. Smith and 
A. J. Duncan, Elementary Statistics and Applications, Chap. XII. 
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deviation of 20. Finally, suppose that the three regions of 
rejection and acceptance shown in Fig. 44 are considered for 
adoption. The question is which of these regions will be the 
best to employ. The answer is that it depends on the nature 
of the problem; this can be illustrated by concrete cases. 
Suppose that the data of the given example pertain to the 
length of life of electric-light bulbs, measured in kilowatt-hours, 
producible by some new process. The manufacturer of these 
bulbs, it can be argued, will desire especially to avoid acceptance 
of the hypothesis that the average length of life of the new bulbs 
is 1,000 kilowatt-hours when in fact it is less than that. For if 
he accepts the given hypothesis when actually the mean length 
of life is greater than 1,000 kilowatt-hours, he stands to lose 
nothing; he in fact gets a better process than he expected. But 
if he accepts the hypothesis of a mean length of life of 1,000 
kilowatt-hours when actually the mean is less than that, he gets 
a poorer process than he expected. The quality of his product 
will not reach the desired standard, and the result may be a 
considerable loss of money. In this instance, therefore, the 
manufacturer will want to adopt regions of rejection and accept- 
ance that will minimize the risk of accepting the hypothesis 
when actually the mean length of life of the bulbs is less than 
1,000. This set of regions of acceptance and rejection is shown, 
in the first diagram of Fig. 44. 


By the selection of this set of regions of rejection and accept- 


ance, the risk of error II is minimized; this is illustrated by three 
diagrams in Fig. 45. The three diagrams in this figure show the 
effects of applying the three sets of regions of rejection and 
acceptance shown in Fig. 44. The three curves in Fig. 45 show 
the true probabilities of obtaining sample means of varying 
Sizes, forming a normal distribution about the true population 
mean of 980, superimposed over the scale showing the regions 
of rejection and acceptance according to the hypothesis of a 
mean of 1,000, shown in Fig. 44. The diagrams in Fig. 44, from 
which the regions of acceptance were obtained, show the prob- 
abilities of obtaining sample means of varying sizes, forming à 
normal distribution with a mean of 1,000 and a standard devia- 
tion of 20. 

Figure 45 suggests that the risk of ac 
of 1,000 when the actual mean is less th 


cepting the hypothesis 
an 1,000 is less for the 
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region of acceptance running from 967.1 upward than for any 
other region that involves the same risk of rejecting the given 
hypothesis when it is true (Ze., for any other .05 region). In 
each of the three diagrams the proportion of the area, under the 
curve crosshatched indicates the probability of accepting the 


I. When region of 
acceptance runs 
from 967.1 upward 


967.1 980 x 
с 


IL. When region of 
acceptance runs 
from 1032.9 downward 


Ш. When region of 
acceptance runs 
from 960.8 to 1039.2 


Fic. 45.— Risk of accepting hypothesis that Population mean is 1,000 when 
actually it equals 980, 


hypothesis that the mean is 1,000, when in fact it is 980, that is, 
the probability of accepting a hypothesis when it is not true. 
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and when the region of acceptance for testing the given hypoth- 
esis runs from (I) 967.1 upward, (II) 1,032.9 downward, and 
(III) from 960.8 to 1,039.2. 

It is clear that this risk is least for the first of these three 
regions, as shown by the fact that the crosshatched portion of 
the area under the curve is less in the first diagram of Fig. 45. 
Further study of this kind shows that, whenever the mean of the 
Population is less than 1,000, region T will always give à lower 
risk of accepting the hypothesis of 1,000 than any other region 


N=100 
N=25 


9608 1000 1092 Æ 9804 1000 1096 x 
o g 


Fra. 46.—Sampling distribution of the mean of a sample when the mean of 
the population is 1,000 and the standard deviation of the population is 100. 
Region of rejection, .025 points’ at each end, when N = 25 compared to when 

= 100. 
sk of falsely rejecting the hypoth- 
Illustrated, therefore, it is the best 
be employed; since the manu- 
cepting the hypothesis 
Tn other instances, 


SE acceptance for which the ri 
esis is .05. For the problem i 
region 'of acceptance that may 
facturer wants to minimize the risk of ac 
that the mean is 1,000 when in fact it is less.! 
one of the other regions might be preferred. 
Effect of the Size of the Sample. In testing hypotheses the 
Size of the sample is of prime importance; for, the larger the 
sample, the narrower the sampling distribution, t.e., the smaller 
the standard error, of the statistic used to test a hypothesis. 
The standard error of the mean, for example, is equal to the 
standard deviation of the population, divided by the square root 
of the size of the sample. Hence the standard error of the mean 
varies inversely with the square root of the size of the sample.* 


1 Of course, if the risk of rejecting the hypothesis were increased, the risk 
of falsely accepting the hypothesis could be reduced. 
2 See Chaps. XIII, XIV. 
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From this inverse relationship between the spread of the 
sampling distribution and the size of the sample it follows that, 
the larger the sample, the closer the finite limits of the region 
of acceptance to the central tendency of the sampling distribu- 


that is not true, In other words, the larger the sample, the 
closer the value of a sample statistic must be to its expected aver- 
age value if а given hypothesis is to be aecepted. This is illus- 
trated by Figs, 46 and 47, which contrast the regions of rejection 
and acceptance with samples of 25 and 100, respectively. 


7/00 
N=25 


967.1 1000 x 983.6 1000 20 


d dard deviation of the population із 100. Region 
8 Ордой, :05 point at the lower end, when Wate compared to when 
4 = Д 


Since the sampling distribution becom 
of the sample increases, the larger t 
probability that a given hypothesis wi 
not true. This is illustrated in Figs. 48a and 48b. 
cipally for this reason that greater co 
making use of a large sample than in 
sample. In Figs. 48a and 48b, the 
occurring in the regions of acceptance are re 
crosshatehed portions of the sampling distributions about the 
true mean in each case. 

Effect of the Statistic Selected. Аз Pointed out above, the 
procedure for testing a given hypothesis may be varied by taking 
different sample statistics. A hypothesis regarding the mean 
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of а normal population, for example, may be tested by using 
ps the mean or the median of the sample. The former has 
e advantage, however, of having a sampling distribution that 


N=25 


2 


967.1 980 1000 x 
т 


куша: 48a.—Standard deviation of the population known to be 100; if the 
о Ев is that the mean of the population is 1,000, the range of values down- 
ve from 967.1 will be a .05 region of rejection for testing this hypothesis with 
i erence to a sample of 25. The range of values upward from 967.1 will be 
i е corresponding region of acceptance. This figure shows the probability of a 
roan falling in this region of acceptance when the true mean is 980 and not 


№=/00 


980 983.5 1000 
T 


1 Fig. 48b.—Standard deviation of the population known to be 100; if the 
hypothesis is that the mean of the population is 1,000, the range of values down- 
ward from 983.5 will be a .05 region of rejection for testing this hypothesis with 
reference to a sample of 100. The range of values upward Нот 983.5 will be 
the corresponding region of acceptance- This figure shows the probability ofa 
sample falling in this region of acceptance when the true mean is 980 and not 


+000. 


is narrower than that of the median. In other words, the 
standard error of the mean is less than the standard error of 
the median. The use of the mean in preference to the median, 
therefore, is the equivalent of employing & larger sample; or, 
to put it another way, as good a test can be made with the mean 


1 This result may be reversed for other tlian normal universes. 
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of a smaller sample as with the median of a larger sample. The 
former is accordingly the better statistic to employ for the test. 

Estimation of Population Parameters. Often the statistician 
is interested in more than the testing of a single hypothesis. 
In some cases, no partieular hypothesis is suggested by the 
problem. Even if it is, the statistician may wish to know, not 
only whether a partieular hypothesis is acceptable on the basis 
of the sample return, but also what hypotheses in general are 
acceptable and what are not. More exactly, he may wish to 
draw a boundary line for which it can be said that the chances 
are, say, 95 out of 100, that this boundary includes the true 
population. In addition, he may want to select a single hypoth- 
esis as the best hypothesis to be adopted. The theory of estima- 
tion seeks an answer to these questions. 

Specification of a Population. A population is precisely speci- 
fied when its functional form is given together with the values 
of its parameters. A normal population, for example, is pre- 
cisely specified when the fact of its normality is given, together 
with the values of its mean and standard deviation. In many 
cases, however, a population will be reasonably well determined 
if its more important parameters are given, such as its mean, 
standard deviation, 81, and 6s. For in this case a Pearsonian 
curve or a Gram-Charlier curve can be derived from which the 
probabilities of the population can be approximately computed. 
Thus, if the form of a population is known, estimates of its 
parameters will exactly specify it; if the form is not known, esti- 
mates of its parameters will at least approximately determine 
the population. Hence, estimates of a population become largely 
estimates of its parameters. ! 

In the ensuing discussion it is assumed that the form of the 
population is known a priori, This will make for greater 
simplicity and precision in the analysis and will facilitate the 
presentation of the argument. It will not affect any of the 
general conclusions of this chapter. 

Confidence Intervals. A principal consideration in the theory 
of estimation is to determine confidence intervals for a popula- 
tion parameter. A confidence interval for a given parameter is 

1 In many instances, knowled 


eters is all that is desired, so t 
required. 


ge of a particular parameter or set of param- 
hat full knowledge of the population is not 
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a range of values that, on the basis of a given sample, has a 
Specified probability of including the true value. The probability 
associated with any confidence interval is called the “confidence 
coefficient? ’ for the interval. 

The procedure by which such a confidence interval for a given 
parameter may be obtained will now be traced step by step: 

1. Assume at the start that the form of the population is 
known a priori, For simplicity, also assume that the population 


Given sample $ $ 
Fic. 49.—А given sample in a region of acceptance. 
5 


Given Sample $ 
Ето. 50.—A given sample in a region of rejection. 


is characterized by the value of a single parameter, which may 


be designated as 0. 
2. Let some sample statistic s be chosen for the purpose of 
estimating the value of 0. What statistic is chosen is immaterial 
to the present argument. 
3. From knowledge of the populatio 
probability calculus the sampling distri 
of the given size. This will give the relative frequencies with 
Which sample s's may be expected to have different values, 
Since the sampling distribution of s is derived from the original 
Population, the precise nature of the distribution will depend 
on the value of the parameter 9. The way in which a sampling 
distribution might vary with variations in а parameter 0 is 


Suggested in Figs. 49 to 52. 


n, derive by use of the 
bution of s for samples 
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+. Let the confidence coefficient for the given interval be set 
at .95. This means that the interval to be determined should 
have a probability of .95 of covering the true value of 0. 

5. Caleulate the value of s for the sample actually obtained. 
Then determine the value of the parameter 0 that will cause the 
upper .025 point of the sampling distribution of s (as noted in 
item 3, this distribution varies with the value assigned to 0) to 
coincide with the computed sample value of 5. Such a value of 


-Given samples 5 
Fic. 51.—Lower confidence limit. 


"Given sample s 5 
Fia. 52.—Upper confidence limit. 

0 is pictured in Fig. 51. Call this value of 0 **0 lower," and 
designate it by 0. Nextselect a value of 0 that will cause the lower 
-025 point of the sampling distribution of 6’s to coincide with the 
computed. sample value of 5. А value of 0 that will give this 
result is pictured in Fig. 52. Call this “0 upper," and designate 
it 9. The interval 0-й is the desired confidence interval. For 
if the foregoing procedure is always followed in setting up con- 
fidence intervals, the true value of Ө will fail to be covered only 
in those instances in Which the sample value of s falls in the upper 
or lower .025 tail of-the true sampling distribution of s; and this 
will oceur on the average only 5 times out of 100. елсе, оп 
; the confidence interval 8-8 will cover the true 
value of 0 95 times out of 100. The values @ and Û are called 
the “lower” and “upper confidence limits” of 0. 


, death a very high confidence coeffi 
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In conclusion, № may be noted that there are three arbitrary 
elements in the procedure for the determination of confidence 
intervals, just as there were three arbitrary elements in the 
procedure for testing a hypothesis. One is the choice of the 
sample statistic s to be used; the second is the selection of the con- 
fidence coefficient; and the third is the selection of the points 
of the sampling distribution of s that are to coincide with the 
sample value. 

Since the sampling distributions of some statistics are narrower 
than others, for the same confidence coefficient a smaller con- 
fidence interval can be obtained in some cases than in others." 
Some statistics thus provide more accurate estimates of popula- 
ton values than others. 

Confidence intervals can be made smaller,’ if the confidence 
coefficient is smaller. Thus, in a given instance one may be 
able to say that the chances are 80 out of 100 that the interval 
50-70 includes the true value; or he might be able to say that the 
chances are 95 out of 100 that the interval 30-90 includes the 


true value. If the investigator is willing to run greater chances 
duce the size of the interval that 


In matters involving life and 
cient should be adopted. In 


testing to discover the effect of an advertising campaign, on the 
other hand, a much smaller coefficient might be used. 

| The third arbitrary element in the determination of confidence 
intervals is the selection of the points of the sampling distribution 
Of s to serve as the points of coincidence with the sample value 
Of s, In the analysis above, the upper and lower .025 points 
Were chosen. For a confidence coefficient of .95 any other set of 
Points that included 95 per cent of the sample values of s could 
have been used, such, for example, as the lower .01 point and 
the upper .04 point, or the lower .03 point and the upper .02 
Point. If an upper or lower limit only is desired for a confidence 
interval, the lower or upper .05 point of the distribution of s can 
be used. The particular points that are chosen in any case will 
depend on the problem in hand. Normally in estimating 0 


of being wrong, he may thus re 
is said to include the true value. 


1 Or if, as noted on pp. 177-178, а confidence interval has only one finite 


bound, this finite bound will be closer to the sample value. 
* Or the finite bound of an interval that has one infinite bound ean be 


brought closer to the sample value. 
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both an upper and lower bound will be desired. In some 
instances, however, the statistician may not care if he overesti- 
mates a certain figure, but he may wish especially not to under- 
estimate it. In other instances, he may wish partieularly to 
avoid overestimation. 

The whole process of determining confidence intervals may be 
illustrated by reference to the previously used electric-light bulb 
problem. Suppose that a sample batch of 25 bulbs has a mean 
length of life of 965 kilowatt-hours, and suppose that the manu- 
facturer desires to determine limits for the true mean length of 
life that have a probability of .95 of covering this true value. 
To simplify the analysis, suppose that the standard deviation in 
length of life of the bulbs is known to be 100 kilowatt-hours and 
that only the mean is unknown.! 

To determine confidence limits for the mean of the population 
given a sample mean of 965 kilowatt-hours, the procedure is as 
follows: First identify the mean of the population as the unknown 
parameter 0, and let the statistic 8, which will be used to estimate 
9, be the mean of the sample. Next note that the sampling dis- 
tribution of the means of random samples from a normal popula- 
tion can be shown by the probability calculus to be a normal 
distribution with a mean equal to the mean of the population 
and a standard deviation equal to the standard deviation of the 
population divided by the square root of N , the size of the sample. 
In the problem in hand, therefore, the standard deviation of the 
sampling. distribution of s will have а standard deviation of 
100/4/25 = 20 and a mean of which the value will be unknown 
but which will equal 0. 

As the third Step in the analysis, note that the value of the 
mean of the population that will cause the upper .025 point of 
the sampling distribution of s to coincide with the given sample 
value of s is 925.8. For the upper .025 point of the sampling 
distribution of the mean lies at just 1.96 times à above the mean 
of the population, i.e., at 1.96 times 20 — 39.2 above the mean of 
the population. If this is to coincide with the given sample 
mean of 965, then the population mean must have the value 
965 — 39.2 = 9258. The value 925.8 will thus be taken as 
the lower confidence limit for the mean of the population. 


1 Knowledge of the standard deviation might be obtained from technical 
considerations, 
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Similarly, note that the value of the mean of the population that 
will cause the lower .025 point of the sampling distribution of 
sample means to coincide with 965 is 965 + 39.2 = 1,004.2. 
The value 1,004.2 may thus be taken as the upper confidence 
limit for the mean of the population. The whole confidence 
interval will thus be 925.8 — 1,004.2, and it may be said that the 
chances are 95 out of 100 that this interval includes the true 
population mean. 

If the statistician wished to make an especially conservative 
report to the manufacturer about the new process, he might 
choose a confidence interval that has only an upper bound (but 
still a confidence coefficient of .95). This upper bound would 
be determined by finding the value of the mean of the population 
that caused the given sample value to fall at the lower .05 point 
of the sampling distribution of the mean. For the problem 
illustrated, this upper bound would be given by 965 + 1.645 
times 20 = 997.9 (for the .05 point of a normal distribution 
comes at 1.645 times с from the mean). The statistician would 
in this instance report to the manufacturer that there is a chance 
of 95 out of 100 that the range of values up to 997.9 includes the 
true value, or, to put it another way, that there is only a chance 
of 5 out of 100 that the range of values above 997.9 includes the 
true value. 

Effect of Size of Sample. For confidence intervals making use 
9f the same statistie, the same confidence coefficient, and the 
Same type of interval, the larger the sample the smaller the con- 
fidence interval, or, in the ease of an interval having only an 
Upper or lower bound, the closer the bound to the sample value. 
That is, а larger sample is always а better means of estimating 
the character of a population than à smaller sample. Of course, 
large samples may be more costly to proeure than small ones, 
and their greater accuracy may not be worth the additional 
expense. In all cases, however, improvement in estimates to be 
obtained from larger samples should be given consideration, for 
the gain from increased size may far overbalance the higher cost. 

Maximum-likelihood Estimates of Population Parameters. 
Sometimes a problem requires something further than setting 
Up a range of values that probably includes the true value of a 
Population parameter. It may be desirable to have a single 
figure that can be considered the “best,” or “optimum,” estimate 
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that can be made of the population parameter from a given 
sample. 

It should be noted at the start that various methods thay be 
employed to estimate population parameters, and it is doubtful 
if any single method is the best for all circumstances. There 
is one method, however, the method of maximum likelihood, that 
has been received with considerable favor, and it is this method 
that will be described here. 

The method of maximum likelihood reasons entirely from the 
sample to the population. It consists in estimating the values 
of the population parameters so that the probability of the given 
sample among all possible samples of the same size is a maximum. 
It will be noted that, if the values of the population parameters 
are given, a particular set of sample values will have a definite 
probability. As the parameter values are changed, the prob- 
ability of the given sample values also changes. According to 
the method of maximum likelihood, the “best estimates" that 
can be made of unknown population parameters, given a par- 
ticular sample, are those values of the parameters which, if they 
were the true values, would make the probability of the given 
sample a maximum. 

Mathematically the method of maximum likelihood may be 
described as follows: Suppose that for given values of the popu- 
lation parameters 0, 95 07, s e the probability of a sample 
consisting of Xi, Xo, ... ; X, is given by the probability dis- 
tribution Р(Х}, Xo, ... » Xn, 0,0, 07, ы. . ). Then for given 
values of Xi X» ..., Xn, that is, for a given sample, the 
best estimates that can be made of 0,0,0”, . . . are those values 
that maximize the logarithms of the function F. The logarithm 
of F, instead of F itself, is maximized because the 
analysis is easier. The results are the same, however, for F is a 

, maximum when log F is a maximum. If the form of F is known, 
these maximizing values can be found by the use of the differential 
calculus. Thus the optimum estimates of 9, 0’, and 0”, .. · 
must satisfy the conditions 


mathematical 


9logF _ 9105 Ё | ð log F 
90 0 96’ 9 — 960” = 0 


"These give as many equations as there are parameters, and their 
common solution affords the desired estimates, 
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It will be noted that the values of the population parameters 
that are found by the method of maximum likelihood are values 
that depend on the given sample values. In other words, the 


estimate of each population parameter is expressed as à certain 


“function” of the sample values. These functions of the sample 
values constitute “statistics,” for that is what a statistic is, a 
function of the observed sample values. Usually it is found 
that the maximum-likelihood estimate of a given population 
parameter is the sample counterpart of the parameter itself. 
If the population is normal, for example, the mean and stand- 
ard deviation of the sample are the maximum-likelihood esti- 
mates, respectively, of the mean and standard deviation of the 
population. 

Sometimes it is desired to estimate a population parameter 
in а way that will be independent of the estimates of other 
Parameters, Such an independent estimate of a population 
Parameter may be made under either of two conditions. If it 
turns out that in making the maximum-likelihood estimates one of 
the partial derivatives gives an equation containing only a single 
Parameter, then this parameter may be estimated immediately 
Without troubling with the estimates of the other parameters. 

The second condition permitting an independent estimate of a 
Population parameter may be described as follows: Sometimes 
it happens that the function F giving the probability of a given 
sample can be broken up into the product of two factors, one of 
Which depends on the sample values and on a single population 
Parameter. If this is true, the value of that parameter can be 
estimated independently of the other parameters by choosing 
the value that maximizes this part of the total probability. For 
example, it so happens that the probability of a sample from a 
normal population can be factored into а part that depends only 
on the sample values and on the standard deviation of the popu- 
lation, The other part depends on the sample values and on 
both the mean and the standard deviation of the population. 
By taking as an estimate of в the value that maximizes the first 
factor, there will be obtained a maximum-likelihood estimate of 
the population standard deviation that is independent of the 
population mean. When the mathematics of this is actually 
carried out,! it is found that this independent estimate of the 


1 See рр. 290-294. 
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population standard deviation is 


А (X, 


B s 
М -—1 


In other words, the maximum-likelihood estimate of the standard 
deviation of а normal population that is independent of the 
population mean is given by the standard deviation of the sample 


multiplied by Ух — т When the mean and standard deviation 


are estimated jointly, the maximum-likelihood estimate of the 
standard deviation of the population is the standard deviation 


of the sample. The factor a N 1 that occurs in the inde- 


pendent estimate may thus be viewed as a correction factor that 
makes allowance for neglect of the estimate of the population 
mean. 

Maximum -likelihood Statistics as “Optimum” Statistics. 
Statistics derived by the method of maximum likelihood are 
deemed to be “optimum statistics” in certain senses.! First 
these statistics are said to be “consistent.” By this it is meant 
that as the size of the sample is increased the sample statistic 
approaches closer and closer to the population parameter it is 
used to estimate. A more precise statement is that the sampling 
distribution of the given statistic becomes more and more con- 
centrated around the value of the population parameter and the 
probability of any given finite deviation from the population 
value becomes less and less. Such an approach of the sample 
value to the population value as the size of the sample is increased 
is spoken of as “stochastic convergence.” A maximum likeli- 
hood estimate is thus a consistent statistic in that it approaches 
the population value stochastically. 

Maximum-likelihood estimates are also optimum statistics 
in that they are “efficient.” By this it is meant that in the 
limit as the size of the sample is indefinitely increased there is no 
other statistic that has a smaller sampling variance. For a 
finite sample some other statistic used to estimate the same 


1 See FISHER, R. A., “On the Mathematical Foundations of Theoretical 
Statistics,” Philosophical Transactions of the Royal Society of London, Series 
A, Vol. 222 (1922), pp. 309-368. 
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population parameter may have a somewhat smaller sampling 
variance, but as the size of the sample is increased the sampling 
variance of the maximum-likelihood statistic will tend to become 
at least as small as that of the other statistic. Maximum-likeli- 
hood statistics are thus said to have sampling variances that are 
а minimum asymptotically. 

A third optimum characteristic of maximum-likelihood esti- 
mates is their “sufficiency.” This characteristic ensures that, 
when a maximum-likelihood estimate has been made of a popu- 
no further information about that parameter 
can be obtained from any statistic that is independent of (t.e., 
not functionally related to) the maximum-likelihood statistic. 
Again this characteristic is obtained only in the limit as the size 
of the sample is indefinitely increased. 
. Maximum-likelihood statistics are thus optimum statistics 
in three ways. They are consistent, efficient, and sufficient. 


lation parameter, 


STRATIFIED, OR REPRESENTATIVE, RANDOM SAMPLING 


ations can be greatly reduced by use 
wledge about a population instead 
ling process itself. Suppose, 
with respect to the division 
on religious preference was 


Often sampling fluctu 
of partial or supplementary kno 
of relying entirely upon the samp 
for example, that it is known that 
of publie opinion on a given questi: 
ап important influencing factor. And suppose further that 
census data are available giving the number of people in the 
Blven community belonging to each religious category. If 
reliance was placed upon random sampling only to give a repre- 
sentative sample of public opinion, then it would be hoped that 
the sample was representative regarding the proportion of various 
religious adherents in each category as well as of the division 
of public opinion within each. When knowledge of the relative 
P ipis of people in each religious category is available; however, 
hen there is no need to rely upon sampling for it. In this case, 
the sampling can be so devised that the number of people sampled 
in each group is proportional to the number of people in each 
group in the whole population. The randomness of the sampling 
will in this instance be restricted to the selection of the individuals 
within each religious category. This is known as “stratified” 
or “representative” random sampling. It is especially impor- 
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tant in public-opinion analysis, index-number construction,! and 
the like. 

The significance of stratified, or representative, random sam- 
pling is that it reduces sampling errors. It makes use of knowl- 
edge of correlation between the variable which is being studied 
and one or more other variables Which are correlated with this 
variable and about which information is available. By using 
this correlation it diminishes theextent of the chance fluctuations." 


PURPOSIVE SAMPLING 

Some sampling methods do not employ the random technique 
but select their samples to conform to chosen criteria. This is 
spoken of as “purposive sampling.” For example, suppose à 
given research bureau is in search of some particular knowledge 
regarding the urban population of the United States. To get 
this they may go over carefully all the cities in the country 
and select a city that is “typical,” say, as to size, as to proportion 
of heavy and light industries in the city, as to percentage of 
foreign born, ete. This city then will be taken as a typical sample 
of American cities, and the data it reveals on the problem in ques- 
tion will be considered as typical of American urban population. 

Purposive sampling is also used in combining census data in 
various ways, in order to save the time and money that would 
be required by use of the complete data. In the Danish census 
of 1923, for example, it was decided to pick a representative 
sample that was to be about 20 per cent of the whole country 
and also 20 per cent of the total in each of the country's 22 
counties. The procedure for Securing this sample was as 
follows: 

For each of the 1,300 parishes in the eountry, the number of 
cows per 100 hectares of farm area was computed and the parishes 
in each county were grouped according to the magnitude of this 
ratio, five groups being distinguished in each county. From 
each of these groups the parish was selected whose agricultural 

1 See Бмітн and Duncan, op. cit., Chap. XIX. 

2 It will be recalled that the variation around a line of regression, i.¢., 


the scatter, or second-order variance, is always less than the total variation 
in the dependent variable. 


*See JENSEN, Арогрн, “The Representative Method in Practice," 
Bulletin de l'Institut international de statistique, tome 22 (1926), 1ère livraison, 
pp. 420-421. ; 
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area most nearly amounted to one-fifth the total agricultural 
area of the group. In some cases, two or more parishes were 
combined or a single parish divided in order to get this 20 per cent 
sample. Next the parishes so selected were drawn on a map to 
see if the various parts of the country were about equally well 
represented. When this was not the case, certain parishes with 
approximately the same number of cows per hectare as parishes 
included in the sample were substituted for the latter so as to 
improve the geographical distribution. Finally this preliminary 
sample was tested with regard to special factors such as the num- 
ber of farms, total agricultural area, grain area, number of milch 
cows, number of pigs, etc., to see whether the sample represented 
with respect to these factors approximately one-fifth the whole 


country. Where there were marked divergencies in this respect, 


adjustments were made to make the sample approximately 20 


per cent of the whole for these auxiliary items without at the 


same time disturbing the area relationship or the geographical 
representation. 


Purposive sampling of this kind is hard to appraise. The 


argument is that if a sample is representative in certain respects 
it will be representative in other respects, but this need not bea 
Necessary consequence. Samples of this kind depend largely 
on the judgment of those making them and are worth just about 
as much.! 

1 Cf. remarks on p. 154. 


CHAPTER IX 
SAMPLING FROM A DISCRETE TWOFOLD POPULATION 


Sampling from a discrete population in which all the elements 
fall into one or the other of two categories is one of the simplest 
sampling problems, This type of problem is illustrated when 
publie opinion is being investigated, when a manufacturer 15 
testing the quality of a given Production process, or when à 
sociologist is determining the ratio of male to female births. 

Sampling from a diserete twofold population affords an oppor- 
tunity to study in a more detailed yet relatively simple way 
most of the aspects of sampling theory sketched in the previous 
chapter. 'The following discussion consequently offers an easy 
approach to the general principles of sampling, and the reader is 
urged to master it completely. The arguments, for the most 
part, will follow the essential steps that were outlined in the 
preceding chapter, Chapter XIII generalizes the argument for 
à manifold population, 


THE PROBLEM AND INITIAL ASSUMPTION 

To avoid excessive abstraction the argument will be expounded 
with reference to a concrete example. Suppose that a manu- 
facturer has produced an order of maehine parts numbering 
several hundred thousand pieces. According to the standards 
of the trade an order of this kind is not acceptable if more than 
10 per cent of the parts are defective, Routine inspection of 
each part is costly, however, and the only feasible method of 
testing the lot is to take a sample. The problem then arises 
as to what inferences can be made about the percentage of defec- 
tives in the whole lot from the determination of the percentage 
of defectives in the sample. 

If the sample yields 11 Per cent defectives, is it a reasonable 
hypothesis that the percentage of defectives in the whole lot 
is only 10 per cent? What are the limits within which the 
percentage of defectives in the whole lot may be reasonably con- 
sidered to lie? What is the best estimate that can be made of 

186 
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the percentage of defectives in the whole lot? It is with these 
questions that the following analysis is concerned. 

Assumptions. In selecting the size of the sample to be tested 
the manufaeturer is pulled in opposing directions. The larger 
the sample tested, the greater the expense to the manufacturer. 
He will seek, therefore, to test as small a sample as is practicable. 
On the other hand, in choosing a small sample he increases the 
risk of having the whole lot rejected when in fact it may be 
above standard.! 

In the ensuing argument it will be assumed that the sample 
selected is small relative to the size of the population, although 
it may be large absolutely. More precisely, it will be assumed 
that the withdrawal of the sample parts does not materially 
change the percentage of defective and nondefective parts in 
the whole lot. Suppose, for example, that the initial lot con- 
Sists of 500,000 parts and that a sample of 2,500 is taken. И 
10 per cent of the whole lot were defective and 40 per cent of the 
Sample were defective (an extremely unlikely result), the per- 
centage of defectives in the whole lot after the entire sample had 
been taken would still be 9.85, which differs but little from 10 
per cent, In what follows it will be assumed that, as the sample 
is drawn, the percentage of defectives in the whole lot remains 
unchanged. This will greatly simplify the analysis. 

It will further be assumed that the process of sampling is à 
bed in the previous chapter. The full 


random one, as descri : 
as the analysis pro- 


implication of this will be understood only 
ceeds, For the moment it may be noted that with randomness 
the results of repeated sampling should be predictable by mathe- 
matical probabilities calculated from some appropriate mathe- 


matical model. 


DISTRIBUTION OF SAMPLE PERCENTAGES 
ation of parts is divided into 


In the problem in hand the popul 3 
arts; it is therefore & 


groups, defective parts and nondefective p 
twofold population. The problem is to make inferences regard- 
ing the percentage of defective parts in the population from a 
sample set of data. The only sample statistic that is appropriate 


1 See pp. 171-172. 
? For а discussion of the complications that arise when this assumption 


cannot be made, see pp. 209-211. 
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for making inferences about the population percentage is the 
pereentage of defective parts in the sample. For subsequent 
analysis, therefore, it will be necessary to have the distribution 
of sample percentages from a twofold population, 2.e., a formula 
giving the probabilities of sample percentages taking on various 
values in repeated random sampling from a given twofold popu- 
lation. The derivation of this sampling distribution is the next 
task. 

Derivation of the Sampling Distribution. То derive a sam- 
pling distribution it is necessary to find a mathematical model 
that symbolizes the conditions of sampling and to use this model 
for the calculation of the desired probabilities. In drawing uP 
the model, therefore, the first Step is to note carefully the con- 
ditions of sampling. In the present problem these may be 
described as follows: 

Conditions of Sampling. Prior to the withdrawal of any 
sample let the percentage of defective parts in the whole popula- 
tion be pi, and let the percentage of nondefective parts be p» 
After the withdrawal of the first member of а sample, the per- 
centages of defective апа nondefec 
population will not, if the population is finite, be exactly pı and 
P» It is one of the fundamental assumptions of the problem, 


ment. The following analysis will assume, then, that a sample 
item is replaced before the next item is drawn. 

The immediate problem is a study of what will happen under 
repeated sampling. If the sampling process is random, each 
member of the population will have an equal chance of being 
drawn on every occasion, This means that as a large number of 
samples are drawn, with replacements, every member of а popu- 
lation will presumably be drawn sometime or other with every 
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member of every other population and in the long run no par- 
tieular combination of members will appear any more frequently 
than any other combination.! 

The Mathematical Model. These conditions of sampling sug- 
gest that the sampling distribution of percentages may be 
derived from the following model: Suppose there are ten packs 
of cards in each of which pi per cent are black cards and ps per 
cent are white cards. All possible combinations of N cards each 
are formed by combining without restriction each card of each 
pack with every card of every other pack. Since the probability 
(relative frequency) of a black card in each of the 10 packs from 
which the various cards are selected is p: and the probability 
of a white card is р», and since the combination of cards is without 
restriction so that the selection of а black or white card from 
any pack does not affect the probabilities in other packs, then, 
According to the multiplication theorem, the probability of a 
Combination in which, for example, the first 4 cards are black 
and the last 6 are white is pip. But the same would be true 
for any combination having 4 black cards and 6 white cards. 
Since there are 019 = pi — 210 ways of picking 10 
cards so that 4 of them will be black, there will be 210 such 
combinations. The probability of a combination having 40 per 
cent black cards and 60 per cent white will therefore be 210 
Pip’. In general, for N packs, the probability of a combination 

|/— . 
having m per cent black cards and N x № per cent white cards 


Will be 


N! P. 
The model: CF рур“ = NIN = №)! prpi^ 


rect, the equation just given 


If this mathematical model is cor 
Ма. 
=> per cent defec- 


Will yield the probabilities of samples having + 


live parts and сеа nondefective 


parts in the whole set of 

1Tt should be carefully noted here that the members are identified as 
individuals so that а particular combination means а particular combination 
of members. Thus а combination of three members in which the first two 
were defective and the last nondefective would not be the same combination 
in which the first was defective and the last two nondefective. 
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samples of N cach that might be drawn at random (with replace- 
ments) from a population containing р; per cent deféctive and pe 
per cent nondefective parts, F or, as noted above, if the process 
of sampling is random and each sample item is replaced before 
the next is drawn, it is reasonable to suppose that every particular 
combination of parts will appear as frequently as every other 
particular combination of parts so that the probability of samples 
N—N 


" N " s 
having W per cent defective and * nondefeetive parts 


will tend to be the same as the probability of combinations of 

А N | = 
cards having м per cent black and = + per cent white cards 
among the set of all possible combinations of cards that might be 
made of one card from each of N different packs, each pack con- 
taining pi per cent black and P» per cent white cards. Although 
this model is derived for sampling with replacements, it also gives 
approximate results, as noted above, for sampling without 
replacements. 


И the percentage of defective parts in the sample 3 is chosen 


as the sample statistic, the sampling distribution of this statistic 
will therefore be 


Ni = N! NinN—N; 1) 

(Ж) = saat at ps ‘ 
This will be recognized as the general equation for a binomial 
distribution.? Hence it follows that the sampling distribution 
of a percentage has? a mean equal to ру, a standard deviation 


equal to 4 PP! = Ф: — pj? ES 1 — 6pips 
q май рр, sand a B: = 3 + Npip: 


Factors Affecting the Distribution. | Effect of Size of the Sample. 
The foregoing equations show that the character of the sampling 


! See p. 188. 
? See p. 43. 


r 5 М, . " 
* Cf. p. 45. Since the attribute is taken as м instead of Ni, as in 


the previous discussion of the binomial distribution, the mean of =! is 


N 
Ni. A |[Npipe Pios 
NP: = pı and the standard deviation of м 18 We "y TWO Since 


B; and 6 are coefficients their values are unaffected by this change in units. 
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distribution of a sample percentage from a twofold population 
is dependent on the size of the sample N. With a small value of 
№, the distribution of probabilities may be very skewed, espe- 
cially if p, is markedly different from р», and the standard devia- 
tion will be large. With a large value of №, the distribution will 
be more symmetrical (8: approaches 0 as N increases), and the 
standard deviation will be small. If large samples are taken, 
therefore, the probabilities of samples in which the percentage 
of a selected attribute deviates slightly from the population per- 
centages will be more or less symmetrically distributed about 
the population percentage as а central value and the probabilities 
of samples in which the percentage deviates by a large amount 
from the population percentage will be very small. That is, 
the larger а sample, the smaller the probability of its differing 


greatly from the population. 
А further consequence of increasing the size of the sample № 
15 to approach the 


is that the distribution of probabilities tend 
normal eurve whose mean and standard deviation is the same 


as that of the binomial distribution.! Hence, in large samples, 
the probabilities of various types of samples can be computed 
from a normal probability table.’ This is an especially important 
conclusion for the practical application of the mathematical 


model to the problem. 

Effect of the Population Para 
tion deseribed by Eq. (1) gives 
of samples on the assumption t 
parts in the whole population is pı- 
the character of the sampling distri 
the subsequent discussion it is importan 
these changes. 

For samples of size 10 (this small siz 
Purposes of exposition), Fig. 53 and Ta 


meter ру. The sampling distribu- 
the probabilities of various types 
hat the percentage of defective 

As the value of p; is varied, 
bution will be changed. For 
t to note the nature of 


e is taken temporarily for 
ble 23 show how the dis- 


! Cf. pp. 46-47. E 
? Actually the ordinates of the binomial distribution are approached by 
dix to Chap. ТУ, pp. 68-74). If, 


those of the normal curve (see Appenc 
however, the probability of a given value of N,/N is represented, not by the 
height of a line or bar, but by the area of a rectangle whose base is à and 


3 N в 
whose height is P (27) И, then аз N is inereased (and hence 6, which equals 


V/pipz/N, is decreased) the rectangles become thinner and thinner and their 


area tends to be approximated by that of the standard normal curve. 
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tribution of а sample percentage changes as the value of pı is 
varied from 0 to 1 by intervals of .05. Tt will be noted that from 
Pı = 0 to p = .5 the sampling distribution changes from à 
positively skewed distribution to a symmetrical one, and then 
from р, = .5 to Pi = 1 it changes to а negatively skewed dis- 
tribution. The reader should here center his attention on the 
character of the rows of Table 23 or on the left-right variation 


М 
М 


Fic. 53.—The variation in the character of the distribution of a sample percentage 


With changes in the population parameter pi. 

of Fig. 53 as he proceeds from back to front. It will also be noted 
that for a given sample percentage N,/N the probability of the 
sample rises as the population percentage pı approaches the 
point p, = N,/N, That is, the probability of а given sample 
result is a maximum when Pi = Ni/N. Here the reader should 
center his attention on the variation from row to row for a given 
column of Table 23 or from back to front for a given value of 
N/N in Fig, 53. 

If the population is very large, variation in р, can be assumed 
to be practically continuous, that is, pı can be given any hypo- 
thetical value between 0 and 1. On the other hand, for samples 
of size 10, only 11 values of N,/N are possible; variation in this 
quantity is therefore discrete. Accordingly, Fig. 53 consists 
essentially of a series of curves running from back to front and 
separated from left to right by intervals of .1. If sampling is 
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made with a much larger sample, say a sample of 2,500, the 
variation in Ni/N may also be considered to be practically Ta 
tinuous. In that instance, Fig. 53 can be replaced by the smoo 


N; 
y 0 
02 03 04 05 06 07 08 09 М 


7 


01 


Fic. 54.—The surface that approximates Fig. 53 when N is large. 


USES OF THE DISTRIBUTION OF THE SAMPLE PERCENTAGE FOR 
TESTING HYPOTHESES 


Up to this point the argument has been purely deductive- 
Given a certain Population, it has been asked, how frequently 
Will random Sampling produce certain types of samples? The 
crucial part of the argument is the reverse of this: Given a certain 
sample, what inferences can be made about the population 
from which it was obtained? It is this part of the argument that 
will now be examined, . 

The Hypothesis. With reference to the original illustration, 
suppose that the manufacturer of machine parts is particularly 
interested in the hypothesis that the whole set or population of 
parts has just the standard 10 per cent of defectives. He takes a 
sample of 2,500 parts and finds that 11 per cent of the sample are 


defective. How does his hypothesis fare in the light of this 
sample result? 
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Coefficient of Risk. The first step is to decide upon the coeffi- 
cient of risk. The manufacturer recognizes, of course, that 
even if the number of defective parts in the whole population is 
just 10 per cent he might by chance get a sample in which the 
percentage of defective parts is much greater or less than 10 per 
cent. There is always some risk that he will reject the 10 per 
cent hypothesis when it is actually true. The initial step, 
therefore, is to decide upon how great a risk of this kind he is 
willing to run. Suppose he decides that on the average he does 


Upper region 
lof t rejection 


Fia. 55.-- A region of rejection for testing the hypothesis pı = .10. Coefficient 
of risk — .05. 

not want to reject the hypothesis when it is true more than 5 

times out of 100. He will then adopt а coefficient of risk of .05. 

A .05 Region of Rejection. From the analysis of the previous 

Section, the manufacturer knows that if he takes a sample of 

2,500 from a large population the probabilities of the various 


Possible types of samples will be given approximately by a normal 
the population percentage pi 


Probability eurve whose mean is at 
and whose standard deviation is ~/pip2/N. From this he knows 
that if the number of defectives in the whole population is 
actually 10 per cent (p, = .10), then the probability of obtaining a 
sample in which the percentage of defectives is equal to or less 
than .10 — 1.968 is .025 (see normal table in Appendix, Table VI). 
Не also knows that the probability of obtaining a sample in 
which the percentage of defectives is equal to or greater than 
ЛО + 1.968 is likewise .025. Hence, according to the addition 
theorem, the probability of a sample in which the percentage 
of defectives lies outside of the limits .10 + 1.966 and .10 — 1.966 


196 ELEMENTARY THEORY OF RANDOM SAMPLING 


is .025 + .025 = 05. Since ¢ = LD) = .006, the actual 
y 

values of these limits are 8.82 per cent and 11.18 per cent (see 

Fig. 55). It follows, therefore, that the risk of rejecting the 

hypothesis when it is true Will be just .05 if the manufacture? 


Region of 


ection __ 


Fig. 56.—An alternative region of rejection for testing the hypothesis р: = a 


Coefficient of risk = .05 


cent and above 11.18 per cent) only 5 per cent of the time. 
Hence, if the manufacturer follows this rule he will, on the 
pothesis when it is true only 5 times out 
of 100, which is the degree of risk he is willing to undergo. 

ions of Rejection. It will be noted, however, 
ion of rejection is not the only one that the 
manufaeturer might adopt, Не can follow the rule of rejecting 
the sample percentage falls above 
-10 + 1.6458, that is, above 10.99 per cent, and his risk of 
rejecting the hypothesis when it is true will still be only .05- 
m à table of the normal curve, the 
probability of a deviation from the mean of 1.6456 or more is 
just .05. 'The region beyond 10.99 per cent is therefore an 
alternative region of rejection with a coefficient of risk of .05. 
This is illustrated in Fig. 56. Still a third region with a coeffi- 
cient of risk of .05 is the region lying below .10 — 1.6456, that is, 
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P p» per cent. For, owing to the symmetry of the normal 
e cis he probability of а negative deviation from the mean 
cage ó or more is also 05. This third alternative region of 
B is illustrated in Fig. 57. Yet a fourth region with a 

cient of risk of .05 is the region lying below .10 — 2.3276 


Region of 
rejection fi 


egion of rejection for testing the hypothesis 
Coefficient of risk = .05. 


Р, 


F; 
1G. 57.—A third alternative г 
pı = .10. 


Region of Region of 
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| 
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F 
та. 58.—A fourth alternative region of rejection for testing the hypothesis 
pı =.10. Coefficient of risk = .05. 


and above .10 + 1.7516, that is, below 8.60 per cent and above 
11.05 per cent. This is illustrated in Fig. 58. 

Me thers are an infinite number of regions of rejection 
2 he manufacturer might adopt, all of which have associated 

with them & coefficient of risk of.05. What is the criterion that 

he should follow in choosing from among these various possible 


regions? 


Р, 
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The Best Region of Rejection. Each region of rejection -— 
corresponding region of acceptance. As suggested in the prev к 
ous chapter, the “best” region of acceptance would be the € 
for which the risk of accepting a given hypothesis when it is ng 
true is a minimum. Since the probability of a sample e 
in the region of acceptance is equal to 1 minus the probability 
of its falling in the region of rejection, the former will be a mini 


|Region of 
peu 


А 3 Fig. 50 varies 
Fic. 59.— The probability of a sample falling in the region of Fig. 56 
with the populat 


A the 
ion percentage pi, illustrating the procedure for finding 
data of Table 24 from which Fig. 60 is plotted. 


mum when the latter is a maximum. The best region ierat 
tion, therefore, will be a region for which the probability | 
rejection when the given hypothesis is not true is a мании, h 
How the various regions Suggested in the previous Leur. m d 
fare in respect to this criterion is indicated by the propetis 
shown in Table 24. Table 24 gives data for each of the dim 
alternative regions, Showing how the probability of a samp 
falling in the region varies with the population percentage a 
i i ili ejection when the hypothesis 
is id true ia eue adt, а proba nie e. те, with hà selected 


region. The criterion is to select the region = Тлар power. 'The 
region so selected will then give the ‘‘most powerful" test, 
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The method of computing the data in Table 24 is illustrated in 
Fig. 59, and the results are presented in Fig. 60, which is based 
upon Table 24. 

It is obvious from this figure that there is no one region for 
Which the probability of à sample is the greatest for all possible 
values of the population percentage different from the one set 
up by hypothesis. The region above .1099 (the region of Fig. 
56) has the highest probability for population percentages 


00% 0.096 0.00 0.104 0108 
D, 


Fia. 60.—Effect that variation in ор percentage р: has ой the prob- 

ability of a sample falling in given regions of rejection. 
Breater than the hypothetical percentage being tested (that is, 
Pi = .10 м this case), and the region below .0911 (the region of 
Fig. 57) has the highest probability for population percentages 
less than the hypothetical percentage. The region below .0882 
and above .1118 (the region of Fig. 55) does well for values of p: 
both greater and less than the hypothesis being tested, but it 
does not have the maximum probability for any values. There 
is not in this instance any single region of rejection that is best 
Гог all problems. 

In order to determine the region of rejection that is best for the 
present problem the manufacturer must draw upon other con- 
siderations. Suppose he wishes particularly to avoid unneces- 
sary and expensive refinements in production methods and 
accordingly aims to keep the percentage of defectives as close 
as possible to the accepted standard of 10 per cent defective 
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parts." At the same time, suppose he does not care if orders "E 
sent out that are below standard; it may be that the purchaser 
will carry out his own test to avoid buying a substandard set. 
Accordingly, the manufacturer will select the region of Fig. 57, 


7 IN 
TABLE 24.— VARIATIONS IN THE PROBABILITY OF A SAMPLE FALLING D 


= * 
GIVEN REGIONS FOR DIFFERENT VALUES OF THE POPULATION Pı 


Probability of a sample in 
Values of pr — n 
Region of Fig. 55 | Region of Fig. 56 Region of Fig. 57 | Region of Fig. 58 
.091 .3157 .0005 .4404 .1932 
.092 .2581 .0010 .3745 -1522 
.093 .2067 .0018 .3121 -1174 
.094 .1623 .0033 .2546 .0876 
.095 .1272 .0055 .2033 :0671 
.096 .0971 .0091 .1587 .0524 
.097 .0758 .0154 .1270 .0435 
.098 .0619 .0227 .0934 .0401 
.099 .0521 .0344 .0694 .0464 
.100 .0500 .0500 .0500 .0501 
101 0597 .0708 .0359 -0636 
.102 .0653 .0968 .0250 .0834 
.103 .0824 ‚1292 ‚0174 1102 
104 .1069 .1685 “0116 ‚1439 
.105 .1389 .2033 .0078 1851 
106 ‚1782 2643 .0049 .2333 
.107 .2218 .3228 .0032 .2846 
.108 .2718 .3821 .0020 3448 
.109 .3304 .4443 .0013 .4053 
* These figures were eal за. vallum бб а НЕЙ 


ivided 
mean and the boundary or boundaries of a region was then EN 
und O the probability of а deviation greater than this value, ot 
from a norma] Probability table, These are the figures given in 
table above, 
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On the other hand, if the manufacturer desires especially to 
protect his reputation and wishes particularly to avoid sending 
out an order that is actually below standard, he will select the 
region of Fig. 56, i.e., above .1099, as the best region touse. In 
this instance, he will accept the hypothesis of 10 per cent least 
often when the population percentage is actually above 10 
per cent. He runs the least risk in this case of damaging his 
reputation. 

If the manufacturer wishes especially to avoid the danger of 
damaging his reputation by sending out substandard lots but 
nevertheless desires to give some consideration to the additional 
expense of attaining too high a standard, he might adopt a 
region such as that lying above 1105 and below .0860, that is, 
the region of Fig. 58. If he is equally indifferent or equally 
concerned about his reputation and the additional expense of 
attaining a high standard, he would do best to choose the region 
lying above .1118 and below .0882, the region of Fig. 55. For this 
latter region favors values neither above nor below the hypothesis 
being tested; it is an “unbiased” vegion.? 

In the problem illustrated, suppose that the manufacturer 
adopts the unbiased region lying below .0882 and above .1118, 
the region of Fig. 55. He knows that if this region is always 
Adopted in his sampling tests he will in the long run reject the 
10 per cent hypothesis when it is actually true only 5 times out 
of a 100. He also knows that the probability of accepting the 
10 per cent hypothesis when it is not true is evenly distributed 
With respect to actual percentages above and below 10 per cent. 
He will in this instance not cause undue damage to his reputa- 
tion nor induce himself to incur an unnecessary expense of 


Production. 

1 That is, least risk for the size region adopted. If the manufacturer had 
adopted a coefficient of risk of .10 of rejecting the 10 per cent hypothesis 
when it was actually true, his regions of rejection would all have been larger 
and his regions of acceptance correspondingly smaller. There would be less 
tisk of accepting the hypothesis in this case when it was not true. When 
it is said in the text that a certain region gives the least risk of accepting the 
hypothesis being tested when some other values are true, it is meant that 
this is the least risk for any region of the specified size, i.e., for any region 
for which the associated risk of rejection is the specified coefficient .05. 

2 Generally a region is said to be “unbiased ” if the probability of a sample 
falling within it is less when the given hypothesis is true than when any 


alternative hypothesis is true, 
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The Test of the Hypothesis. The actual sample result is 11 
per cent. Since this does not fall in the adopted region of rejec- 
tion, the hypothesis of Pı = 10 per cent is not rejected and the 
given lot of parts is deemed to be of standard quality. 


USE OF THE DISTRIBUTION OF A SAMPLE PERCENTAGE TO 
ESTIMATE THE VALUE OF THE POPULATION PERCENTAGE 
The foregoing pages were primarily concerned with the rejec- 

tion or acceptance of a particular hypothesis regarding the 

percentage of defective parts in the population. It did not indi- 
cate the limits within which this percentage might reasonably 
lie, nor did it indicate what might be a good estimate of the 

actual percentage of defectives in the whole population. Of 8 

these considerations are of more importance than the testing 

of a particular hypothesis. 

Determining Confidence Intervals. The procedure for deter” 
mining a reasonable range for a population parameter, given 2 
particular sample, is much the same as that followed in testing 
hypotheses regarding that parameter. First it is necessary tO 
decide upon the degree of risk that is to be run in failing ej 
include the true value of the parameter within the estimate 
range. То put it positively, it is first necessary to determine 
the degree of confidence that may be had in the estimated range 
covering the true value. Furthermore, if no one type of range 
is found to be the best, it is necessary to decide whether failure 
to include the true value because the range is placed too low is 
`of equal importance as failure to include the true value because 
the range is placed too high. These matters will now be con- 
sidered in some detail. 

Determining an Unbiased Interval. In the example of the 
Preceding section the manufacturer of machine parts took ® 
Sample of 2,500 parts from a large lot and found that, of these 
2,500 parts, 275, or 11 per cent, were defective. On the basis of 
this sample, the manufacturer may determine an unbiased! con- 

! The interval is called “unbiased” because it is so determined that the 
probability of the interval failing to cover the true value because it is 00 
high is equal to the probability of the interval failing to cover the true value 
because it is too low. Tt happens in the present instance that this unbiased 
interval is also symmetrical about the sample value. In other cases, how- 


ever, it will be found that an interval that is unbiased in the probability 
sense is not symmetrical around the sample value (see pp. 287-289). 
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fidence interval as follows: The confidence coefficient associated 
with this interval will be put at .95. First let the manufacturer 
find the value of p, that will make the difference between it and 


UN. Я 
the sample percentage, that is F — pi, just equal to 1.968. 


Since в = a2 this value of pi will be given by the solution of 
1 


the equation m — p - 1.96 JB for pi, Which leads to the 


approximate equation’ 


VN, — № 
Ns _ ies AS 2) 


Tn the present instance this formula yields the result pı = .098. 


It will be noted that this lower limit for pı is designated pi 
(called “pı lower"). The procedure is illustrated in Fig. 61. 
Next let the manufacturer find the value of p that makes the 
difference between it and the sample percentage just equal to 
— 1.966. This second value will be given by the equation 


y 1 1 2 
W — pı = —1.96 n or by the approximate equation? 


(3) 


For the given problem this yields р: = 122. Again it will be 
noted that this upper limit for pi is designated by Dı (called “p1 
upper”). For illustration the reader is again referved to Fig. 61. 
substituting m for pı and 


1 The approximate formula is obtained by 


N- v kn р 
E PAL for p: in the radical on the right, that is ê 1s estimated from the 
sample percentages. The formula given by the solution of the equation is 
N Ni 
N, +1.92 — 1.964 [96 tM- N 
BT N + 3.84 


normal curve is taken as an 


but when N is large, as it should be when the z 
e exact formula differs 


approximation to the binomial distribution, this mor 
but little from the approximate one given in the text. 

? The exact formula is the same as that given in 
except that the square root is now preceded by а 
negative sign. 


the preceding footnote 
plus sign instead of a 
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These two values of Ps, viz., pı and фу, mark off a ma 
interval that the manufacturer may claim has а probability o 
-95 of covering the true value. This means that, if he b 
tinuously follows the foregoing procedure in sampling oR З 
twofold population, the interval he sets up will on the averag' 


Fia. 61.—Determination of an unbiased 
fidence coefficient — :95. Note th 
values of р and ps. 


on- 
confidence interval for Pip tia 
at d = Npip: and hence varies wi 


cover the population value 95 times out of a 100. For, o E 
assumption that the norma] curve is а good approximation he 
the binomial distribution, sample percentages will fall within 
Tange +1.966 of the population percentage 95 per cent of t 
time. Hence, ranges of +1.966 about the sample percentag 

will include the population Percentage 95 per cent of the үш 
It is only When the Sample percentages deviate more es 
1.968 from the Population value that a range of + 1.96ê abou 

the sample percentage will fail to include the population Eco 
and according to the normal probability curve the probabili y 
of such an event is only .05. It will be noted that the i d 
facturer Speaks, not of the probability of the Population He 

falling in the confidence interval, but of the probability of t | 
interval covering the population value. For it is the n 
that varies from Sample to sample, not the population value; th 

latter is an unknown constant. 
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Other Confidence Intervals. The confidence interval just 
determined is not the only interval with a confidence coefficient 
of .95. Other intervals with the same confidence coefficient 
may also be derived. For example, it is possible for the manu- 
facturer to select an interval that has only a variable lower limit, 
the upper limit being the maximum attainable value of p; = 1.00. 
This lower limit may be computed by finding the value of p; that 
will make the probability of getting the given sample percentage 


Ni 


у ora greater value just .05. Such a value of the lower limit 


H у . 
will be given by the equation n — pi = 1.6456 or the equation 


=n : ; 
ру = їп 1.645 КЕЧЕ Ni For the given case in which 


№ = 2,500 and №, = 275, this lower limit will be .100. The 


Manufacturer can thevefore say with equal validity that the 


interval .100-1.000 has a chance of 95 out of 100 of including 
the population value. 

Again, with the same confidence coefficient, of .95, the manu- 
facturer may choose an interval that has only a variable upper 
bound, the lower bound being the minimum possible value of 0. 
This upper limit can be computed by finding the value of pi 
that will make the probability of the sample percentage of а 
lower percentage just equal to .05. The value will be given by 


the equation Ni pi = —1.6456 or by the formula 


N 
ММ: = Ni 
n= m + “چ ا‎ 


For a sample of 2,500 and a sample percentage of 11, this upper 
limit will be .120. Hence, with as much truth as in the other 
cases, the manufacturer can say that the range 0-.120 has a 
Chance of .95 of including the population value. 

Finally, the manufacturer may adopt an interval determined 
in the following manner: He may determine à lower limit such 
that, if it were the population value, the probability of getting 
the sample percentage of 11 per cent or & greater value will be 
just .04. And he may determine an upper limit such that if it 
were the population value the probability of getting the sample 
percentage or a lower value will be just .01. These two limits will 
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be given by the equations 


N if : 
* —pre 17514 and T — p = 23276 
Г ММ, — М and 
ог by the equations, p, = M = Titel a an 
7 NN, — № а 
7ı = of + 2.327 E HN For a sample of 2,500 and а 


sample percentage of 11, these limits will be .099 and .125. 
Thus the manufacturer might say, as in the other cases, that the 
interval .099-.125 has a chance of .95 of including the population 
figure. 7 
The Best Interval. There are accordingly an infinite Rumba 
of different ranges, all associated with à confidence oon 
of .95, that might be adopted in setting up confidence limits = 
the population parameter Pr. Is there any one range that mig) 
be designated the "best"? Before answering this ee 
consider the various values covered by the different intervals. 
If the confidence interval given by the limits 


N , [NN, — Nj 
ү + 1.96 . |- RE 


1$ compared with the confidence interval given by the limit$ 


— № а] 
ae = 1.645 JAN. М and 1, it is found that for an actua 


Population value of, say, 10 per cent the latter interval en 
include the value above 10 per cent, especially the higher уа 
such as 13 per cent, 14 per cent, etc., much more frequently pent 
the former would. For the latter always runs up to 100 per cen 
no matter what the sample percentage, while the former runs p 
as far as 13 per cent, say, only if the sample value is as high b 
11.68 per cent: Which, on the assumption that the population 
value is 10 per cent, is likely to be a rare phenomenon indeed. 


1 
* For .04 of the normal curve lies about 1.7518 from the mean and .0 
lies below — 2.3276 from the mean. 


N Dip», 
1 This is calculated from the exact formula м — pi = 1.96 UA 


^ 11.68 per cent — 10 percent _ 80; 
* If pi = 10 per cent, then в = .006, 5 = 2.80; 


and the probability of a normally distributed variate exceeding its mean БУ 
more than 2.80 times g is only .0026. 
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On the other hand, the second type of interval always has а 
larger value for its lower limit than the first, and it will therefore 
include values below 10 per cent (the actual population value) 
less frequently than the first type of interval. If the same type 
of comparison were made between the interval given by the 


limits m + 1.96 ja and the interval given by the 


limits 0 and № + 1.645 4 AN NÎ, it would be found that the 
d 


N 


former would include values of p: below the actual value of 10 
Per cent much less frequently than the latter, while the latter 
Would include values above 10 per cent less frequently. 

In the problem illustrated it would seem that the manufacturer 
would prefer the interval running from 0 to 


1N М, — Ni 
р: = т + 1.645 н ] 


for this would put the best face on his produet. It would include 
the true value 95 per cent of the time, just as the other intervals 
Would, and it would put the upper limit no higher than necessary. 
The buyer would be at no disadvantage if this method of stating 
the limits were employed. For if he understood their derivation 
he would know that 5 times out of 100 he might get a lot in which 
the percentage of defective parts exceeded the stated upper limit, 
and he would not buy unless he were willing to run this risk. 
Influence of the Size of the Sample on the Confidence Interval. 
It will be noted that in the foregoing analysis the size of an 
interval depended upon the value of в and this in turn depended 
9n the value of N, the size of the sample. Since ё = w/psps/N, 
it is seen that the larger the sample, the smaller the value of 6, 
and hence the narrower the confidence interval. That is, for a 
given confidence coefficient, say a probability of .95, the larger 
the sample, the closer the population value may be estimated. 
Influence of the Size of the Confidence Coefficient on the Con- 
fidence Interval. The size of the confidence interval is also 
dependent on the size of the confidence coefficient. For example, 
if in setting up an unbiased confidence interval the manufacturer 
had been willing to have the interval cover the population value 
only 90 times out of 100, he could have set up limits given by 
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P + 1.6458 instead of = + 1.968.1 Accordingly, by choosing 


a lower confidence coefficient, the manufacturer could have 
narrowed his confidence interval. The same is true for other 
types of confidence intervals. iy the 
A Single Estimate of p. The foregoing has had to do with t 
determination of a range of reasonable values for pi. In po 
instances, however, the manufacturer may desire a single Er 
mate of the population percentage. He may wish, for example, 
to report to a buyer some measure of the quality of the lat d 
sold, and an estimate of the percentage of defective parts ue ш 
lot may he а suitable figure for this purpose. Estimation 0 
population parameters by the method of maximum er 
was outlined in the previous chapter. Consider how it may 
applied to the present instance. r 
In estimating p, from the sample result, the mna 
might proceed as follows: He might turn to Fig. 53 or to ! 
more general form, Fig, 54, and pick out the point on the ient 
axis representing his sample percentage. The chart would ha : 
to be one for which the size of the sampie, N, was the same 1 
that taken by the manufacturer. Then he could proceed m € 
direction of increasing P: values until he had reached the highe? 
point on the surface, i.e., until he found the value of p, that ma 
the probability of the given sample a maximum. If he a£ 
taken a sample of 10 items, for example, and 2 had been foun 
defective, he would start at the point .2 on the N,/N axis of Fig- 
53 and proceed perpendicularly to this axis until he had reache f 
the value p, = .2. There he would find that the probability » 
his sample result would be а maximum. The maximum-likeli- 
hood estimate of pi would be .2, thesame as the sample percentage 
This, and all estimates made in this way, are called “maximum 
likelihood estimates,” since they choose the value of the popula- 
tion parameter, in this case Pi, so that the probability, or 
likelihood, of the given sample result, here N,/N sisa maximum. 
In general, the maximum-likelihood estimate of a population 
Percentage is the sample percentage. This may be demon- 
strated mathematically as follows: Equation (1) shows that if the 
percentage of attribute A in the population is ру, the probability . 
* For the probability of a normally distributed variate deviating from its 
mean by +1.645¢ or more is just .10. 
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of a sample of № having №, A's is! 

№; N! Р ПЕ? 
А — pisci 

( ) NUN Ng — р) 

According to the differential calculus, the value of pi that will 
maximize this probability for a given value of N; and N must 
Satisfy the condition? 


Ше] 
Ro = тар tpi PP pao 
1 


Which reduces to 
Ni(l — pi) — pN — Ni) = 0 
or 
= Ni 
pir 
If the maximum-likelihood estimate of pı is written ji to dis- 
tinguish it from the population value itself, then ğı = N,/N is 
the maximum-likelihood estimate of pi that is, the sample 
Percentage N,/N is the maximum-likelihood estimate of the 
Population percentage р. 
SAMPLES FROM RELATIVELY SMALL POPULATIONS 
One of the fundamental assumptions of the foregoing analysis 
Was that the population was very large relative. to the sample 
taken. For it was on this basis that the probability of a given 
attribute in the population was assumed not to be materially 
affected by the withdrawal of various members of the sample. 
It was argued, for example, that, if a sample of 2,500 parts was 
taken from a lot of 500,000 parts in which the actual number of 
defective parts was 10 per cent, the percentage of defectives in 
the 499,000 parts left after 1,000 parts had been drawn would 
still be close to 10 per cent (actually, 9-82 per cent) even if all 
the 1,000 parts drawn turned out to be defective. Hence the 
analysis proceeded on the assumption that the percentage of 
defective parts in the population remained unaffected as the 
sample items were taken out. That is, the sampling discussed 
was sampling with replacements. It is the purpose of this section 
1 Here pz is replaced by its equal 1 — pı. 
2 Generally the logarithm of the probability is maximized, but here it is 
just as easy to differentiate the function itself. 
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to consider how the analysis must be modified when nonreplace- 
ment of the sample items is taken into account. 

When the population is of such a size relative to the sample 
that consideration must be given to the effects of the withdrawal 
of the sample items on the percentage of a given attribute in the 
population, then the distribution of a sample percentage can no 
longer be adequately described by the binomial distribution. 
Instead, as may be shown by an analysis similar to that of Chap- 
IV, the mathematical law describing the distribution of a sample 
percentage when replacements are not made is the hypergeo" 
metrical distribution. Thus, if а random sample of М cases Е 
taken from а population consisting of 5 cases and if the pari 
centage of cases in the population having attribute A is pı aP 
the percentage not having the attribute А is 1 — pi = p» then 
the probability of a sample (drawn without replacements) con- 
taining N,/N. A's, is given by the formula 


Р c9 _ (P:S) (psS) (S — N)IN! ‚@ 
N (mS — Ni)YpsS — N + Ni)ISIN WN — Ny)! 


This substitution of the hypergeometrical distribution for er 
binomial distribution is the fundamental modification that: ae 
be made in the previous analysis when sampling is made withou 
replacements. j 

If both the population and the sample are relatively ia 
then either the hypergeometrical distribution itself may i 
employed to calculate the probabilities of various sample results 
or one of the Pearsonian frequency curves that approximate 
this distribution can be used for this purpose. Calculations ie 
this kind are quite complicated, however, and will not be further 
considered here. The interested reader is referred to the Appen- 
dix to Chap. IV (page 81) for a discussion of the appropriate 
Pearsonian сигуе to fit and to pages 127 to 131 for a method of 
measuring probabilities from such a curve. 

When the population and sample are both moderately large: 
the hypergeometrical distribution can be approximated by ? 
normal distribution, which again greatly simplifies the analysis- 
Specifically, if the sample № is less than half the population 5, 
if 1/4/N and hence 1/\/S = М are so small as to be negligible, 
and if Np; and Nps are both moderately large, then the hypergeo- 
metrical distribution can be approximated by a normal distribu- 
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tion, whose mean is p: and whose standard deviation is given by 


the equation! 
= Аа] = NY 
ë N ( 3) (5) 


When it is possible to use the normal distribution in place of 
the hypergeometrical distribution, the analysis proceeds in 
almost exactly the same manner as described in the preceding 
sections. The only difference is that, instead of using the value 


эре, Eq. (5) is used in its place. It will be noticed that the 


effect of this modification is to reduce the value of 6 employed, 


since 1 — N is necessarily less than 1. This means that con- 


S 


fidence intervals, regions of acceptance, and the like, will be 
smaller when account is taken of the size of the population. As 


N 
5 becomes large relative to Ñ, however, the factor 1 — я becomes 


Practically 1 and the size of the population may be disregarded. 
The analysis then becomes that of the previous sections of this 


Chapter. 


SAMPLES FROM POPULATIONS IN WHICH p; IS VERY SMALL 


There are some instances in which the chance of a “favorable 
Occurrence is very small If a large enough sample is taken, 
however, a few cases will be found. The example most com- 


monly referred to is that of Borthewitch, who found that for 


10 Prussian army corps the average number of deaths occur- 
ring from horse kick during 10 years was .61 per corps per year. 
Another example is the counting of cells in certain biological tests 
and experiments where the chance of occurrence of cells of а 
Certain type on a given plate or a given section of a plate is very 
Small. 2 

Whenever the chance of a favorable case is very small, the 
binomial distribution is not well approximated by the normal 
distribution unless № is very large. In these instances the 


1 Cf. BowLEY, А. L., Elements of Statistics, 5th ed., PP- 282-284. 
? Cf. FISHER, В. A., Statistical Methods for Research Workers, par. 1 
3 See p. 73. 


5. 
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binomial distribution is better approximated by the Poisson 
distribution.! "This has the form? 


Ni a 
P(N» = NI ga (6) 
which my also be written 


PNJ = р)" exp [- Npi] 


where P(N;) represents the probability of №, cases out of N, 
Pı is the probability of a favorable case m = Npi, ande = ie 
is the base of Napierian logarithms. For example, if p: = X 
and N — 1,000, then m — 3 and the probabilities of 0, 1, 2, 9 


4, . . . favorable occurrences are? 

№ P(N) 

0 eor = .0498 
1 er = 1494 
2 ex = 2240 
3 eo = .2240 
4 ex = .1680 
5 le& = .1008 
6 e 35 — .0504 
7 Jes 5340 = .0216 , 
8 E = .0081 
g le 365550 = .0027 
10 |e 558-500 = .0008 


1 Also known as the “Jaw of small numbers" or “Taw of small chances." 

* The derivation of this approximation is given in the Appendix to this 
chapter (p. 215). . , 

* These probabilities may be computed directly by noting that 
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Fra. 62.—The Poisson distribution, m = 3, p. = .003, № = 1000 [see Eq. (6)]. 

If it is desired to find the probability of Nı exceeding or falling 
short of a given value, it is necessary merely to sum the individual 
Probabilities in question. For example, if prc .003 and 
N = 1,000 as in the table above, the probability that two or 
fewer favorable cases will occur is equal to 


.0498 + .1494 + .2240 = .4232. 


The probability that six or more favorable cases will oceur equals 
-0504 + .0216 + .0081 + .0027 + .0008 + . . . , which equals 
approximately .0836. Since the table is not completed in this 
direetion owing to the small probabilities, the probability in 
question is more accurately computed by subtracting from 1 the 
Probability of five or less favorable occurrences. The latter 
probability equals 
.0498 + .1494 + .2240 + .2240 + .1680 + .1008 = .9160. 


Hence a better approximation to the probability of six or more 
favorable cases is 1 — .9160 = .0840. 

The use of the Poisson distribution in testing a hypothesis 
about a population may be illustrated as follows: Suppose it is 


logi 67° = —3 logo e = —3(.4343) = —1.3029 


and therefore e~* = .0498, or they may be obtained from Karl Pearson’s 
Tables for Statisticians and Biometricians, Table LI. 
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wished to determine the chance of dying from a new type of 
inoculation, say against smallpox. 'The existing method, it 
will be assumed, results in the death of not more than .5 per cent 
of those inoculated, and the new method will not be adopted 
widely if its death rate appears to be significantly more than that. 
А thousand test cases are made, and of these seven die. Is it 
reasonable on the basis of these tests to reject the hypothesis that 
the true death rate from the inoculation is more than .5 per cent? 

To answer this question, set the coefficient of risk at approxi- 
mately .05, and place the region of rejection all at the upper 
end of the distribution. (For the risk of accepting the hypothesis 
when the death rate is actually higher than .5 per cent is to be 
made a minimum.) 16 will be noted that the Poisson distribu- 
tion, like the binomial distribution, is discrete so that it may not 
be possible to obtain a region of rejection for which the prob- 
ability is exactly .05. One coming as close as possible to this 
standard, however, will be adopted. 

If .5 per cent is the true death rate and samples of 1,000 
are taken, the distribution of sample death rates will be approx 
mated by a Poisson distribution for which m — (1,000) (.005) = "à 


The individual probabilities for the more likely sample results 
will thus be! 


en 55N1 
Ni! 
.0067 
.0337 
.0842 
-1404 
.1755 
.1755 
.1462 
.1044 
.0653 
.0363 
.0181 
.0082 
.0034 
.0013 
.0005 
.0002 


х 


PN) = 


5 د م ت‎ o e оно 


m 


н 
л eo t 


е 


1 These аге taken from Pearson’s Tables for Statisticians and Biometricians, 
Table LI. 
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The total probability of a sample containing 8 or less deaths 
is .9319; thus the probability of a sample containing 9 or more 
deaths is .0681. "This is close enough to .05 to take values of 
М, equal to 9 or more as the desired region of rejection. The 
actual sample has 7 deaths, which is not in the region of rejection; 
therefore the hypothesis of a true death rate of 5 per 1,000 
cannot be rejected. From the figures above it is seen that a 
sample could have as high as 8 deaths without causing the 
hypothesis of a general rate of 5 deaths per 1,000 to be rejected. 

An upper bound for the true death rate that will have a 
probability of .95 of including the true rate may be found as 
follows: Note first that the sample return is 7 out of 1,000. 
Then examine Pearson’s tables to see for what distribution the 
sum of the probabilities of 0 to 6 deaths per 1,000 is just .05. 
This will be found to be the distribution for which m = 11.8. 
Since the sample contains 1,000 cases and m = Npi, it follows 
that the .95 upper bound for pı, given a sample rate of .007, is 
0118. Tt may be said, therefore, that the chances are 95 out of 
100 that the range 0-1.18% covers the true percentage. 

The mean of a Poisson distribution is m, its variance is also 


m, its фу = L, and its Gs = + +3. When N is so large that 


Np, = m is also large, the Poisson distribution is fairly well 
approximated by the normal curve. This is merely another 
way of saying that even when p: is very small and hence p» — pı 
is relatively large, if N is big enough, the binomial distribution 
approaches the normal distribution. In this instance it makes 
little difference whether the variance of the normal distribution 
is taken as m = Np, or as Npipe since pe is very close to 1. 


APPENDIX 
Derivation of the Poisson Distribution and Its Properties. 
The Poisson distribution is derived as follows: According to the 
binomial equation the probability of N; cases out of N is? 
! 


POV) = riy PEPE а) 


Where pı + pe = 1 and № + М, = М. 


1 See p. 73. 
2 See p. 43. 
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By dividing numerator and denominator by Ns! = (N — Їз)! 
and setting p» = 1 — pı and №, = N — M,, this may be written 


P(N; - NN =D - xi ` (М- Ni+1) а — py (9) 


Suppose that Diis very small but that N is lar ge enough so that 
Np: has a value of, say, at least .1. For example, if p: = 003, 
let N = 1,000, so that the value of Np, is 3. Represent Np: by 


E е 
that m, so р: = X If this value of р: is put in the foregoing 


equation, then 


Pony = AAS Ds SUE eng Tg ( m (3) 


— ы INN: 
By distributing (x) throughout the first N, factors and by 


А ANS r | 
separating (\ = x) into its two components, Eq. (3) gives 


P(N) (а -}) e 


ауд)" e 


But if N is Pto as it must hes to make Np, equal to .1 or more; 


Ni = 
the value of y and l il be negligible. The value of 


= —M 
(1 — AR) will be approximately e-m , and the value of (: — =) 


N 
will be practically 1 since ™ N is practically negligible. Hence 


P(N) = Nac (5) 


or it may be written, 
PN) = NI — [^m] (6) 


This is known as the жо distribution after the man who 
first developed it, 
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"The mean of a Poisson distribution is m, which may be shown 
as follows: 


First note that 
E. 


№: 
b 7 exp [~m] = 1 (7) 
0 
Since the sum of the total probabilities of a distribution equals 1. 


Next note that the mean of the Poisson distribution is by 
definition 


[m] (8) 


N 
т^! 

T = 
Mean №; en 
0 


Note further that if N; in the numerator is canceled against the 
М, factor in N,! and if m is factored out of т^, Eq. (8) becomes 


Mean V: m Ya 


But if М, — 1 is set equal to №; and N —1 to N’, the sum term 
becomes 


m^" Tw ; exp [ m] (9) 


№ 
0 
Which by Eq. (7) equals 1. Hence 
Mean №, = m (10) 
Similarly, à? of Nı = m, which may be shown as follows: 
By definition, 
2 m" 2 
8 = Ум Уп °*Р [—m] — m 


But Nî ММ, — 1) + Ni; therefore, 


1 Ni 2 
= Ум, = D exp [—m] + Уу м exp [ml —m? 


The first term of the right-hand side of this expression equals 


m? Ха n y exp [— m] 
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е m as 
Which as above equals m?, and the second term equals 
demonstrated in the previous paragraph. Hence 


бу, = m? +m — m? 
or 


2 (11) 
б, = m 
In the same manner it can be shown that us = m, and 
u = т + 3m? 
Hence 8 =. nd 8: = ig 
Ci = m а = m - 


These latter formulas 
the Poisson distribution 
equals approximately 
fact, this approach to 


Suggest that, if N and hence m is po 
approaches the normal form; for Qi: t E. 
0, and B: equals approximately 3. T 
normality with increasing size of m wre 
be mathematically demonstrated in a manner essentially 


А ja 
same as that used to demonstrate the approach of the binom 
distribution to normality. 


CHAPTER X 


SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 
I. VARIOUS SAMPLING DISTRIBUTIONS 


The foregoing chapter was concerned with attributes that 
are qualitative, such as “defective” versus “nondefective,” “for” 
versus “against,” “black” versus “white,” or attributes that 
have only discrete numerical values. 'This and the following 
chapter will be concerned with attributes that may theoretically 
vary continuously, such as the heights of individuals, yields of 
Wheat, and the like.? 

The theory of sampling for a continuous variable has been 
most completely worked out for a normal population, t.e., 
à population in which the probability, or relative frequency, of 
а given variable is measured by the normal probability curve. 
Because of this and because hypotheses of normal populations 
arise frequently in practical problems, the present chapter will 
develop the analysis in considerable detail. 

The argument will apply strictly to a hypothetical infinite 
population. It might, for example, apply to the infinite set 
of electric-light bulbs that could be produced by a given process 
if that process were to be used indefinitely without modification. 
It might also apply to the infinite set of crops of a given variety 
that might be yielded by repeated farming of a given type of 
land with a given type of treatment. Again it might apply 
to the infinite set of white adult males living now and in the 
future. In fact, it might apply to the results of many types of 
repetitive or continuous physical, biological, or social processes. 
The argument will also be applicable without serious error to a 
large finite population that is distributed approximately in a 
normal manner, such as the heights of existing white adult males. 
In this instance the population will be presumed to be so large 
relative to the size of the sample that the withdrawal of the 
sample does not materially change the distribution of probabili- 
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ties in the population. Thus, if the probability of a case lying 
between 1.2 and 1.3 is .09, say, before the sampling is begun, it 
will be assumed to remain equal to .09 throughout the whole 
of the sampling process. ] 

The analysis will assume that the process of sampling is & 
random one. This implies, as previously, that the results oF 
rather the distribution of the results of repeated sampling from 
the given population can be predicted with reasonable accuracy 
by the use of an appropriate mathematical model. The principal 
task of the analysis will be to develop such a model. . 

The problem to be considered will be this: A random sample 18 
obtained from a given normal population. Being normal the 
population may be specified by the mean and standard deviation; 
for the 8ı of all normal populations is 0, and the 8» is equal to 3- 
Hence the problem will be to make inferences about the mean 
and standard deviation of the population from knowledge 
of the mean and standard deviation of the sample. 

To solve this and similar problems, it will be necessary tO 
consider what would happen under repeated random sampling 
from the given population. For this purpose, such statistics 
the mean and variance are selected, and a study is made of how 
these statistics vary from sample to sample. Such a study seeks 
in particular to estimate the relative frequencies with which the 
selected statistics will assume various values among the infinite 
set of samples of given size that might be drawn from the given 
population (with replacements of the samples if the population 
is finite). "That is, it seeks to derive the sampling distributions 
of the selected statistics. : 5 

То derive these sampling distributions by actually drawing 
à large number of random samples from the given population 
would be a tedious if not an impossible task in almost all cases: 
Fortunately, the derivation may be undertaken by a theoretical 
process similar to that by which the sampling distribution of 
percentages was derived in the preceding chapters. The steps In 
the theoretical argument are to find à mathematical model that 
appears to reproduce the conditions of Sampling and then tO 
derive the distributions of the selected statistics for this model 
by means of the probability calculus. 

On the assumption that the process of Sampling is а, random 
one, it is argued that the actual distributions of statistics among 
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а large set of samples from the given population will tend to 
conform to the theoretical distributions worked out for the given 
mathematical model. The basis for this, it will be recalled,’ 
is intuition and experience with respect to mass phenomena or 
what has been called the “law of large numbers." The task 
of the present chapter is to set up the appropriate mathematical 
model and to derive the theoretical sampling distributions for 
various statistics. 


DISTRIBUTION OF SAMPLES OF 2 FROM A NORMAL POPULATION 


For simplicity suppose that a sample contains only two cases, 
that is, N = 2. Of course, in practical work, samples rarely 
contain such a small number of cases. Much is to be gained in 
the theoretical analysis, however, by taking samples of 2. For 
the essentials of the argument are the same for these samples 
of 2 as they are for large samples, but the argument is much 
simpler and easier to comprehend. If the argument for samples 
of 2 is clearly understood, the generalization for samples of larger 
size is not very difficult. 

In accordance with the assumption noted above, the popula- 
tion is assumed to be so large that the withdrawal of the sample 
does not materially change the distribution of probabilities in 
the population. Hence the withdrawal of two cases from a 
single population will be essentially the same as withdrawing 
one case from one population and another базе from another 
population identical in all respects to the first. Similarly, 
repeated withdrawals of samples of 2 from a single population. 
(with replacements of samples if the population is finite) will be 
essentially the same as repeated withdrawals of samples of 1 
each from two identical populations. This suggests that the 
sampling distributions of means, variances, and other statistics 
of samples of 2 from a normal population may be derived from 
the following mathematical model: 

The model is an arithmetical model instead of an algebraic 
one; this is intended to facilitate the exposition for the reader 
who does not have a ready knowledge of the calculus. However, 
algebraic equations will be given for all results obtained; and, 
for those who have the mathematical training, the translation of 


1 Cf. SmıTH, J. G., and A. J. Duncan, Elementary Statistics and Applica- 
tions, pp. 239-241. See also pp. 27-28. 
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the numerical argument into its algebraic counterpart is given 
in the appendix of this chapter. d 

The model supposes two populations identical in all respects; 
each population has 100,000 cases, which, when grouped in glasi 
intervals of 2, have relative frequencies of a normal frequency 
distribution. These relative frequencies are given in Table m 
The mean of each population is 100, and its standard deviation 8 
10. Now suppose that each case of population I, like that shown 


—| 


я 92 94 96 9g 100 102 104 106 108 
X2 
Fra. 63.— Probability of а joint sample X: = 102-104 and X; = 98-100. 
in Table 25, is combined without restriction with each case о, 
population II, also like that shown in Table 25, and suppose Ree 
the pairs of values so obtained are checked off on а "s 
diagram such as that ilustrated in Fig. 63. For example, aie 
case from population I (call it Х,) belongs to the interval 9 | 
апа a case from population II (call it Хз) belongs to the pure 
102-, then this pair of cases (this sample of 2) will be checked о 
аз belonging to the cell abcd of Fig. 63. Я ib 
If such pairs of cases, or samples of 2, are formed wie 
restriction, as is supposed, then according to the multiplication 
theorem for independent probabilities, the probability Dim p 
frequency) of a pair of cases belonging to any one cell will tà 
equal to the probability (relative frequency) of a ease belonging d 
the interval from Which the first member of the pair was selected, 
times the probability (relative frequency) of a case belonging to 
the interval from which the second member of the pair was 
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TABLE 25.—DizsTRIBUTION or PROBABILITIES FOR А NORMAL POPULATION 
(X = 100 and в = 10) 


Lower Limits Probabilities! 
of Class (Frequencies Relative 
Intervals to the Total of 100,000) 

58- .00002 
60- .00004 
62- -00009 
64- -00018 
66- -00035 
68- -00066 
70- -00121 
72- .00210 
74- .00354 
76- .00570 
78- .00885 
80- -01318 
82- -01887 
84— .02596 
86- .03431 
88- .04359 
90- .05320 
92- .06239 
94- .07033 
96- .07616 
98- .07926 
100- .07926 
102- .07616 
104— .07033 
106- .06239 
108- -05320 
110- .04359 
112- -03431 
114— .02596 
116- .01887 
118- .01318 
120- .00885 
122- .00570 
124— .00354 
126- .00210 
128- .00121 
130- .00066 
132- .00035 
134— .00018 
136— .00009 
138- .00004 
140- .00002 


ed ие derived by successive subtraction of the .2 intervals in the five- 
e normal prol ility area table in Е. C. Mills ts 
Pretling epa Ес не D. H. Davenport, A Manual of 
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selected. For example, if the probability of a case belonging to 
the interval 102- in population II is .07616 and the probability 
of a case belonging to the interval 98- in population I is .07926, 
then the probability of a pair of cases belonging to the cell 98- 
for X, and 102- for X», that is, cell abed of Fig. 63, is 


(.07926) (.07616) = .006036. 


The probability of any pair of cases among the set of all possible 
pairs of cases may thus be calculated by simple multiplication 
of two elementary probabilities. This has been done for the 
more probable pairs of cases, and the results are presented in 
Fig. 64. 

It may appropriately be argued that this distribution of 
pairs of cases is a good prediction of the results that would 
be obtained by drawing one item at random from population I 
and another at random from population II, recording the results, 
and then replacing the cases and repeating the process. For 
if the process of selection is а random one, each case in each 
population will be drawn just about as often as every other 
case; and if the selection of the first case does not influence the 
selection of the second, then no particular pair of cases will 
tend to occur any more frequently than any other pair of cases- 
As a consequence, the relative frequencies of various types of 
samples will, in repeated sampling, be approximately the same 
as the relative frequencies of various types of combinations among 
the set of all possible combinations of one item each from the two 
populations. 

According to the argument presented above, the latter will 
so be a good approximation to the relative frequencies of 
various types of samples of 2 from a single population. For 
in this case, the original population can be viewed as population I, 
and the population remaining after the first case has been drawn 
can be viewed as population II, the form of the latter being 
practically the same as that of population I. Relative fre- 
quencies in Fig. 64 therefore constitute a good prediction of the 
distribution of a very large number of samples of 2 from a normal 
population whose mean is 100 and whose standard deviation is 10- 

Properties of the Distribution. Its Circular Symmetry. The 
most striking feature of the set of samples represented in Fig. 64 
is the symmetrical distribution of the samples, They tend to 


al 
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cluster most closely around the point whose coordinates are both 
equal to the mean value, 100, and to thin out evenly in all direc- 
tions from that point. In fact, it will be noted that cells lying 
equally distant from this central point have approximately equal 
probabilities. 

If the cells had been made smaller, this circular symmetry 
would have been more evident. Careful study also shows that 
the probabilities in each row and each column tend to conform to 
а normal frequency distribution with the same mean and same 
standard deviation as the original population. 

Figure 64 is typical of a large set of samples from a normal 
whatever the size of the sample. In every case 
the various samples cluster most closely around the point whose 
coordinates are all equal to the mean of the population, and in 
every case the distribution of samples around this point conforms 
to а symmetrically circular (or spherical) pattern. 

Geometrical Measurement of the Mean of a Sample. Figure 64 
designates a sample by indicating the interval to which case I 
belongs and the interval to which case II belongs. It may also 
be used to find the interval to which the mean of any sample 
belongs. This follows from the geometrical properties of the 
figure. 

By definition the mean o 


population, 


f any sample of two cases is 


"TR 
Limes mew 2 


For a given value of X, this is the equation of a line in Fig. 64 
that runs through the point Хз = 2X on the X-axis and the 
point X; — 2X on the X;-axis. All Х.Х» sample combinations 
that lie on this line have the given mean value. For example, in 
Fig. 64 any sample point lying on line AB, such as points (100,90), 
(98,92), (94,96), (88,102), and (102,88), has a mean of 95. 
Coordinates are given in order Х.Х». In general, the plane of 
Fig. 64 may be covered with a set of parallel lines, such as line AB, 
any one of which is the locus of all sample combinations having 
the same mean. Geometrically, the slope of all these parallel 
lines is such that the increase in X; is matched by the decrease in 
Xo, or vice versa, SO that the mean of the two remains the same. 

The parallel lines that could be drawn perpendicular to lines like 
AB (for example, like GH) represent samples such that, whenever 
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X; increases by a given quantity, X» increases also by the same 
given quantity. Thus, on line GH, if X; is 94, X; is 86, whereas, 
if X, is 96, Xə is 88—as X, has increased by 2, X» has also increased 
by 2. Of this latter group of sample combinations one set is of 
special interest, viz., that for which X; and X; have the same value, 
or, algebraically, X; = X». In Fig. 64 all samples lying on line CD, 
which passes through the origin and bisects the angle between 
the axes, have values of X; = Xo. In fact, line CD is the locus 
of all sample combinations in which Х, = Xe. But in addition, 
when X; = X,, it also follows that X = Х, = Xs. As a conse- 
quence, every point on line CD has a pair of equal coordinates 
whose value is the mean of those samples lying on the line 
perpendicular to CD through this point. This follows from the 
fact that on a line perpendicular to CD, such as AB, the mean is 
constant. 

As a result of these geometrical properties, the mean of any 
sample in the diagram can be found immediately as follows: 
First locate the sample point on the diagram. Then proceed 
up or down from this point along a line perpendicular to the 
line CD until the intersection with CD is reached. The mean 
of the sample is either the X; or the X» coordinate (for X; = X?) 
E^ n еке point. _ For example, the sample point 

›106) les on a line that intersects CD in a point whose X» 
coordinate is 95. Hence the mean of this sample is 95. 

Nas. prera of Fig. 64 also make it possible to find the 

Tee ee h 26 айу Sample mean belongs. Thus suppose 

ervals аге mar ked off on the X 2- (or X;-) axis, the cor- 

a ponme points on the line CD are found by vertical (or 

i d. pressum, and lines perpendicular to CD are drawn 

b. ў. ese points. Then to find the interval to which the 

ean oi a sample belongs it is necessary to find merely the pair 

h the sample lies. For example, the 


found very useful in 
Geometrical Measur 


tion of a Sample. Figure 64 may also be used to find the interval 
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within which the variance and standard deviation of a sample 
lies. First note that by definition the variance of a sample of two 
items is 


° 


(X; = X) + (X: — Xy 
т 2 


With reference to Fig. 64 this may be interpreted as one-half 
the square of the distance! from the point whose sample values are 
both equal to X to the point whose sample values are X; and Ху. 
For example, one-half the square of the distance from the 
point (102,88) to the point (95,95) is the variance of the sample 
102,88. Likewise, one-half the square of the distance from the 
point (108,82) to the point (95,95) is the variance of the sample 
108,82, a sample that has the same mean as 102,88 but a differ- 
ent variance. Similarly, one-half the square of the distance 
from the point (102,92) to the point (97,97) is the variance 
of the sample 102,92, a sample whose mean and variance both 
are different from those of the sample 102,88. Е 

It follows from this that all sample points equally distant 
from the points representing their means have the same variances. 
Since the line CD is the loeus of the points representing the 
Means of the various samples, it may be concluded that all 
Sample points equally distant from the line CD (in a perpen- 
dicular direction) have the same variances. Thus, if intervals 
are marked off on some line perpendicular to CD and if lines are 
drawn parallel to CD through the end points of these intervals, 
the interval to which the variance of any sample belongs may 
be found by merely noting the pair of lines between which it 
lies. For example, the line EF is 2 /2 units distant from the 
line CD. АП samples on this line therefore have variances 
equal to 1(2 4/2): = 4. The line GH is 4 4/2 units distant from 


line CD, and all samples on this line have variances equal to 16. 
й and hence has a 


The sample 98,104 lies between these two lines 
Variance lying between 4 and 16. | 
Since the standard deviation of а sample is equal to the 
square root of the variance, the above method of finding the 
! It will be recalled that the distance between points (Xi, Yı) and (X, Уз) 


is 
и 
VX ZX) +0 = з 
* See pp. 225-226. therefore, the diagonal is equal to 


? Each side of a cell is equal to 2 units; 
2 4/2 units. 
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variance of a sample also automatically gives its standard 
deviation. 'Thus the lines EF and GH include samples whose 
standard deviations lie between 2 and 4. 

'This geometrical method of finding the intervals to which 
the variance and standard deviation of a sample belong is very 
helpful in deriving the sampling distributions of these two 
statistics. 

Geometric Measurement of the Sample Statistic N -R d 

т 
As will be pointed out later, when the standard deviation of 2 


a b 
Fia. 65. 


population is not known and the sample is small, the statistic 
VN (X -X. id 

g —— 1S the best to employ in testing hypotheses and 
determining confidence limits for the mean of the population. 
In this statistie, X represents the mean of the sample, X is the 
hypothetical value for the mean of the population, and & is the 
maximum-likelihood estimate of the standard deviation of 
the population. The relationship between the value of č and 


the standard deviation of the sample is ¢ = da т The 


derivation of this maximum-likelihood value is given in the next 
chapter. 


In Fig. 64, the statistic VN Qt – 3) may be interpreted 
g 


geometrically as follows: First, in Fig. 65a and 65b, note that 
VN (X — X), which for samples of 2 becomes 4/2 (X — X), is 
equal to the distance from the point h, whose coordinates are 


both equal to X, to the point m, whose coordinates are both 
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equal to X.* Again note that, for samples of 2, = с V2 and 
that this latter is the perpendicular distance from the sample point 
(X1,X5) to the mean point (X,X).' Hence, if the sample point 
lies above the line CD, as in Fig. 65a, the statistic SENE IS 
is but the cotangent of the angle’ that the line connecting that 
point with the central point (X,X) makes with CD. If the sample 


point lies below CD, as in Fig. 65b, then the statistic ин M 


equals minus the cotangent of the angle that the line connecting 
the sample point with the central point (X,X) makes with CD. 
If lines are thus drawn through the central point (X,X) so 
that the cotangents of the angles they make with CD form 
regular intervals of, say .2, then to find the interval within 


Which the statistic VN -5 lies for any sample it is neces- 
ГД 


sary merely to note between what pair of lines the sample point 
lies. For example, the line LL in Fig. 64 makes an angle with 
CD, the cotangent of which is 2, and the line KK makes an angle, 
the cotangent of which is 1.8. Hence, the sample 120,106, which 
lies between these two lines and above CD, has а value for the 


Statistic VN (Х -X that lies between 1.8 and 2 (its exact 
т 


value is 1.857). Similarly, the lines L'L' and K'K' form angles 
with CD of which the cotangents are —2 and —1.8. Hence 
the sample 120,106, which lies between these lines and below 


CD, has a value for VN (X-X) that lies between 1.8 and 2. 
ГД 


SAMPLING DISTRIBUTION OF THE MEAN 


The foregoing properties of the distribution of samples of 2 
from a normal population offer à ready means of obtaining the 
sampling distributions of various sample statistics. Consider 
first the sampling distribution of the mean. 

. By definition, the sampling distribution of sample means 
is a description of the relative frequencies with which samples 

* CD makes a 45-degree angle with the X;-axis, since it is the locus of all 
points whose two coordinates are identical. Hence in the right triangle 


him the distance mh = 4/2 times the distance ml. 


1 All angles are viewed as being measured counterclockwise from CD. 
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i s. In any concrete problem, such as 
a Cah ae кы ofa list of intervals together with 
dha ÎM Penades of samples whose means fall pw 
intervals. Аз pointed out above, it is possible from E 25 
to find the interval within which the mean of any sample be 
lies. Hence to find the sampling distribution of the insu A 
two cases it is necessary only to lay off a given range of inter |. 
Гог the mean and then determine from Fig. 64 the alae 
frequencies of samples lying in the various intervals. The exac 
procedure may be illustrated by the following example: i 
Suppose it is desired to find the relative frequency or prob- 
bility of a mean lying between 95 and 97 (i.e., from 95 up to 
but not including 97). To do this, first proceed from the points 
95 and 97 on the X.-axis to the points vertically above on the 
CD line. Draw lines through these points perpendieular to CD. 
These are the lines АВ and А’В’ of Fig. 64. The relative fre- 
quency, or probability, of samples having mean values lying 
between 95 and 97 is equal to the relative frequency, or proba- 
bility, of samples lying between the lines AB and А’В’. This 


is equal to the probabilities of all cells completely included 
between these two lines; i.e., it is equal to 


535.6 494.5 4917 
100,000 ^ 100,000 + 100,000 Т ^^" 
plus the prorated shar 


: € of the probability of those cells only partly 
included between these two lines, that is, 


1, 85804 ‚1 452 i 4217 | 1 4945 pu 
2' 100,000 + 5 100,000 * 3° 100,000 + 3° 100,000 + 


я +. 9,570. ili 
T'he grand total is Tio? and this is accordingly the probability 
7 
of a sample having a mea 

When the procedure just deseribed is applied to finding the 
probabilities of sample 


intervals 75-, 77-, eto., 
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variance of the population. Furthermore, the figure suggests 
that the sampling distribution of the mean has the form of à 


normal frequency curve. 
These indications of the numerical analysis are verified by 
algebraic analysis. This shows that in general the sampling 
0.1 
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ling distribution 


с 
lation with the samp 
Data from Table 26. 


t its mean is the 
the samples have been 
ance of the popula- 
for the sampling distribution 


Fie. 66.— Comparison of а normal popu 
2) 


of means (№ = ** 
distribution of the mean is norm? . 
mean of the population from which 5 
drawn, and that its variance is 1 /Nth the vari 
tion. The algebraic equation 


al in form, tha 


of the mean is thus \ #-Ю az 
5 eov cues | (1) 
Ey = а P T ] á 
aP(X) = 1— 75 [ 25i 
Where és = 28 
| N 


1 See Appendix to this chapter (p. 251). 
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i 5 N 
TABLE 26.—COMPARISON OF THE SAMPLING DISTRIBUTION or THE MEA? 
WITH THE DISTRIBUTION OF THE POPULATION 


А Probabilities of 
Lower limits 
of class 
inerves ашыган | Wopdlation 

(1) (2) (3) 
75- .00036 .00451 
TT- .00093 .00714 
79- .00215 .01086 
81- -00456 .01585 
83- .00895 .02224 
85- .01619 .02999 
87- .02705 .03887 
89- .04177 .04839 
91- -05960 -05790 
93- .07856 .06658 
95- .09571 .07355 
97- -10773 -07808 
99- ‚11206 .07968 
101- .10773 .07808 
103- .09571 .07335 
105- .07856 .06658 
107- .05960 .05790 
109- ‚04177 .04839 
11- .02705 .03887 
113- -01619 .02999 
115- .00895 .02224 
117- .00456 .01585 
119- -00215 .01086 
121- -00093 .00714 
123- .00036 .00451 

car M MON 


values for their variances, 
intervals and records the rel. 
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pointed out above,’ all sample points equally distant from the line 
CD (in a perpendicular direction) have the same variances. 
Thus a set of lines can be drawn parallel to the line CD and by 
computing the relative frequency, or probability, of the samples 
falling between various pairs of lines the relative frequency, or 
probability, of а sample having a given variance can be deter- 
mined. The procedure may be illustrated by an example. 

The sample 98,94 has a mean of 96 and а variance of 4. The 
point representing this sample in Fig. 64 lies on the line EF 
running parallel to line CD. ‘According to the previous para- 
graph, all other sample points lying on the line EF have the same 
variance. The sample 100,92 has a mean of 96 and a variance 
p 16. The point representing this sample in Fig. 64 lies on the 
line GH, also running parallel to CD. All sample points lying 
on line GH therefore have a variance of 16. It also follows that 
all sample points lying between lines EF and GH have variances 
lying between 4 and 16. Furthermore, since all points on the 
line ЕР’ of Fig. 04 are the same distance from line CD as those 
on line EF, they also have sample variances of 4, and likewise 
all points on the line GH’ have sample variances of 16. Hence 
the total set of samples whose variances lie between 4 and 16 
COnsists of all the samples lying between the lines EF and GH 
on the one hand and the lines ZF’ and G'H' on the other hand. 

he relative frequency, or probability, of a sample Havin) Sa 
variance between 4 and 16 is therefore the relative frequency, or 
Probability, of a sample lying between EP and GH ‚ plus the 
relative frequency, or probability, of a sample lying between 
ЕР and Н". As in the case of the means, this total probabil- 
ity may be computed directly from Fig. 64 by adding the prob- 
abilities of all the cells and sections of cells included between 
these two pairs of lines. The result in this particular case is 


found to be 0.20510. I is suggested that the reader check this 
ЗУ саттуйп aleulations himself. ©. 

By اا‎ aio of the foregomg procedure, v 
Probabilities of samples having variances lying ОНИ other 
class limits may readily be computed. ps Pur ЖО nd 
пава in "Table 27 and pictured graphically in Jg. Е uM fie 
represents the sampling distribution of the bw of s 2 ке 
of 2. Owing to the small size of the sample, the shape 


! See pp, 227-228, 
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distribution is unusual. A more common shape is shown by E 
distribution of samples of 11, which also is illustrated in Fig. 67. 


222% 


Samples of two 


Distribution of variances of samples 


ОЁ two and samples of eleven 
Po?) 


Ё Samples of eleven 


шыг ne O O — 256 
04 6 % 64 00 e 4 196 


Fra. 67.—Effect of size of sample on the sampling distribution of variances. 
Algebraie analysis! 


shows that the sampling distribution of 
variances of samples o 


f N has the following equation: 
NSN» NEB 

nuc E ps | Net] ae a 
dP(s?) = хр | — == de 

N-1 e - 3) А. 

NT کو‎ |. 


22 


For N greater than 2, this represents a curve that starts at 0, 
rises to à peak at o? — ES 8°, and approaches 0 again as €" 
goes to infinity. For small val 
skewed. For large values of 
symmetrical and almost norma 
being approximately the v. 


ues of N the curve is thus bos 
№, however, the curve is more 
l in form, the mean of the curve 
ariance of the population 6? and its 
standard deviation being 6? 2 


For small samples, the sampling distribution of the variance 
is thus nonnormal, and 


2 use cannot be made of the normal 
probability tables in testing hypotheses and determining con- 
fidence limits. It ean be shown, however, that if the unit 


of measurement is taken as т times the variance of the population 
‚8? à 
(that is, 5) then the distribution takes on the form of the x 


1 See Appendix to this chapter (p. 263). 
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TABLE — 
m. 27.—SAMPLING DISTRIBUTION OF THE VARIANCE AND STANDARD 
VIATION OF SAMPLES ОЕ 2 FROM A NORMAL POPULATION WHOSE 


в = 10 
(N = 2) 
ЕН 
Values of 
——— —— | Probability 
2 Е 
0- 0- | .22193 
2- 4-| .20510 
4- 18-| .17515 
6- | 36-| .13820 
s- | 64-| .10078 
10- | 100-| .06790 
12- | 144- | .04228 


14- 196- .02432 
16- 256- .01291 


18- 324- .00632 
20- 400- .00285 
22- 484— .00118 


24— 576- .00042 
ےے‎ SS ee 


or VARIANCES AND STANDARD 


TABLE 28.—SAMPLING DISTRIBUTION 
A NORMAL POPULATION WHOSE 


Derviarions or SAMPLES OF 11 FROM 


à = 10 
ee iui CET 
Values of 
— — | Probability 
с о? 
0- o- | .00008 
2- 4- | .00248 
7 16- | .04847 
в | 36| .22713 
| 64| -36406 


10- | 100-| .25270 
12- | 144- | .08718 
14- | 196-| .01592 
16- | 256-| .00095 
18- | 324- | .00005 
. :00005 _ 
.99902 


ru ue epu, EE 
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distribution. More specifically, it can be shown that the 
quantity IIN = a has a sampling distribution that is of 
the form of the x? distribution with n in the x? equation equal 
to N — 1. Tables of the x? distribution can therefore be used 
in computing probabilities of various sample variances. This 
use of the x? tables will be explained more fully below.! 

The sampling distribution of the standard deviation of samples 
of 2 is found from Table 27 by taking the intervals 0-, 2-, 4-, 67, 
8-, the limits being the Square root of the limits adopted for the 
variance. "The result is shown in Fig. 68. The figure also gives 
the distribution of the standard deviation for samples of 11. 


34x 
Samples Ї | Samples of eleven 
ofíwo, | re 
1 
| Um 
e i Distribution of standard 


deviations of samples of 
two and samples of eleven 


ВЫ 077 v _ 
0246 810 416 18 20 22 
g 


Fic. 68.—Effect of size of sample on sampling distribution of the standard 


deviation. 
The equation for the sampling distribution of the standard 
deviation is 
М 
dP(c) = MERC 


No? 3) 
exp | 4 йс ( 
ox (" y 3r ghi $ 


This equation is not used in practical analysis since it is easier 
to work with the variance о? and to use tables of the x? distribu- 
tion as just explained. 


SAMPLING DISTRIBUTION OF VN t — 3) 
E 


'The sampling distribution of VN e =z% gives the relative 
frequencies with which samples take on various values of this 
1 See рр. 284-289, 
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statistic. In numerical cases, it lists intervals of this statistic 
and gives the relative frequencies of samples belonging to these 
intervals. 

It was pointed out above that if a line is drawn through a 
sample point connecting it with the central point (X,X) the 


value of the statistic VN (x =й) for the sample is equal to the 
т 


cotangent of the angle that this line makes with CD if the 
sample point lies above CD or to minus the cotangent of this 
angle if the sample point lies below CD, all angles to be read 


counterclockwise. 
The relative frequency, or probability, of samples pie 


values of VN (X — 3) lying between certain limits is accord- 
о 


ingly the relative frequency with which samples fall between 
the radii through the point (X,X) whose cotangents are equal 
to these limits. For example, in Fig. 64, the cotangent of the 
angle that LL makes with CD is 2.0, and the cotangent of the 
angle that KK makes with CD is 1.8; also, the cotangents of 
the angles that L/L’ and K'K* make with CD are —2.0 and — 1.8, 
VN E z) lies 


respectively. Hence, the samples for which 


between 1.8 and 2.0 are the samples included between the 

lines LL and KK and lying above CD and the samples included 

between the lines L/L/ and K'K' and lying below CD. Similarly, 

the samples for which VN (X — 9) lies between — 1.8 and —2.0 
ГД 

veen LL and KK and lying below 


are the samples included bet* à 
CD and the samples included between L'L' and К’К’ and lying 
le points are distributed with a 


above CD. Since various samp nts 

circular symmetry about the point (X,X), it follows that the 
relative frequency of samples included between LL and KK and 
lying above C Dis proportional to the angle between these two lines. 
More specifically, it is equal to the ratio that this angle bears to 
360 deg. Since in Fig. 64 angle LXD = 26.56 deg and angle 
KXD = 29.05 deg and hence the difference equals 2.49 deg, it 
follows that the relative frequency, or probability, of samples 
included between the lines LL and KK and lying above CD is 


1 See рр. 224-225. 
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2.49 
equal to 360 = -0069. 


abilities represented by Fig. 64 this is also the probability em 
sample included between L'L' and K'K' and lying below CD. 


Owing to the symmetry of the prob- 


i VN - 3) 
Hence the probability of a sample having a value of “= CE 

between 1.8 and 2.0 is -0138; similarly, the probability of а s 
having a value of that statistic between — 1.8 and — 2.0 is .0138. 


0030 
0025 
0020 
0.015 
0.010 


0.005 


0 

LEON ET UE 22 EE IF ys cac pro B 
t 

Fic. 69.—A ¢ distribution with № =п—152-1=1. Бава Table 29 

The sampling distribution VN (t — 3) == can therefore be 
с 

readily obtained f 

which the cotang 


take the Successive differences b. 
the results; finally divide these 
for each successi 


ог samples of 2 as follows: Find the angles for 
ents are, say, 0.0, 0.05, 0.15, 0.25, 0.35, etc.; 
etween these angles, and double 
results by 360. This quofient, 
ve interval, will be the probability of a sample 
having a value of VN (x -& e = lying between 0 and 0.05, between 
с 
0.05 and 0.15, between 0.15 and 0.25, betw: 


ееп 0.25 and 0.35, etc. 
` Owing to the Symmetry of the probabilities 


represented in Fig. 64, 


the probabilities of negative values of VN (t — X) are the 
g 


SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 239 


р VN (X -Š 
TABLE 29.—SAMPLING DISTRIBUTION OF THE STATISTIC z ) 


FOR 


SauPLES or Two FROM A NORMAL POPULATION 
(€ = 100 and в = 10) 


00 [£2] (3) [©] m | (б) 
Cotangent of в Angle inter- Twice angle | Class inter- 
angle with Ang vith vals, 6.0. EIS vals in the | Probabilities 
line PATRI 360 deg. statistic 
—.05 92.862 
.00 90.000 5.724 .03180 —.05- 03180 
5 87.138 
5.609 .03149 05- .03149 
15 81.469 
5.506 .03059 15- 03059 
25 75.963 
5.253 .02918 .25- 02918 
35 70.710 
4.939 .02744 .35- 02744 
45 65.771 
4.580 02544 .45- 02544 
55 61.191 
4.215 02342 55- .02342 
65 50.976 
3.846 .02137 .65- .02137 
E 53.130 
3.493 .01940 | .75- .01940 
85 49.637 | 
3.107 .01759 | .01759 
95 46.470 | 
2.867 .01593 | . 01593 
1.05 43.603 | 
2.593 .01440 | 1.05- .01440 
1.15 41.010 | 
2.350 . 01306. ] 1.15- „01306 
1.25 38.660 | 
2.132 .01184 1.25- .01184 
1.35 36.528 
1.936 .01076 1.35- .01076 
1.45 34.592 
1.763 .00979 1.45- .00979 
1.55 32.829 
1.611 .00895 1.55- .00895 
1.65 31.218 
1.475 .00819 1 .00819 
1.75 29.743 
1.355 .00753 1.75- .00753 
1.85 28.388 
1.238 .00688 1.85- .00088 
1.95 27.150 | 
1.150 .00639 1.95- .00639 
2.05 26.000 
1.056 .00587 2.05- .00587 
2.15 24.944 
.980 .00544 2.15- .00544 
2.25 23.964 
914 .00508 2.25- .00508 
2.35 23.050 
850 .00472 2.35- .00472 
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VN (X — 3) Los 
TABLE 29.—SAMPLING DISTRIBUTION OF THE SraTIsTIC — P 


{ inued 
SAMPLES or Two FROM A NormaL POPULATIO (Continued) 
(6) 
a) (2) (3) (4) (5) 
А inter- | Twice angle | Class inter- TS 
Cile itr | Anglo with Е dntervals | ‘vals in the. | Probabiliti 
Bime MEC ZLXk | divided | "ш 
2.45 22.200 е 
j .785 .00436 2.45- -0043 
2.55 21.415 " 
0 744 ‚00413 2.55- .0041 
2.65 20.671 
-691 .00384 2.65- .00384 
2.75 19.980 P 
.642 .00357 2.75- .003 
2.85 19.338 | 
.614 .00341 2.85- -0034 
2.95 18.724 в 
.508 .00316 2.95- .0081 
3.05 18.156 А 
-546 .00303 3.05- -0030 
3.15 17.610 " 
-505 .00280 3.15- .0028! 
3.25 17.105 S 
-486 .00270 3.25- .0027 
3.35 16.619 е 
К 454 .00252 3.35- +0025 
3.45 16.105 
| 0 
.432 .00240 3.45- ‚0024 
3.55 15.733 
.413 .00229 3.55- .00229 
3.65 15.320 4 
-389 .00216 3.65- -0021 
3.75 14.931 
6 
-370 .00206 3.75- .0020! 
3.85 14.561 
7 
-354 -00197 3.85- .0019' 
3.95 14.207 
.339 .00188 3.05- .00188 
4.05 13.870 y 
-321 -00178 4.05- -0017 
4.15 13.549 
-308 .00171 4.15- .00171 
4.25 13.241 ч 
.295 ‚00164 4.25- .001 
4.35 12.946 ^ 
-281 . 00156 4.35- . 0015! 
4.45 12.605 
.270 .00150 4.45- .00150 
4.55 12.395 ; 
.257 .00143 4.55- .0014: 
4.05 12.138 
.250 -00139 4.65- .00139 
mu uM ee eee кш 


SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 241 

same as the probabilities of positive values. The sampling 

distribution of vNG-5 is thus symmetrical about zero. 
СД 


The calculations here indicated are carried out in Table 29 and 
the results pictured graphically in Fig. 69. 
Algebraic analysis shows that, in general, the sampling dis- 


tribution of the statistic VN — 93 is of the form of the г 
т 


(or Student's) distribution, with the n in the equation equal to 
N — 1. For samples of any size the equation for the ¢ distribu- 


tion is as follows:! 
(" — 3 
ap [i= УМ EE A T dé (4) 
— (n—2 EVA 
Ang (257): (: 5 Э) 
where n = № — 1. 
Аз might have been inferred from the numerical analysis, 


VN (X — 3) is independent of the 
č 


ac be 


the sampling distribution of 


hypothetical values assumed for the mean or standard deviation 

of the population and varies only with N, the size of the sample. 

For large values of N the curve is approximately normal with 

а mean of 0 and a variance approximately equal to 1. In these 

tre the normal probability table can be used in place of the 
able. 


SAMPLING DISTRIBUTIONS OF OTHER STATISTICS 
In estimating the properties of a normal population the 


Sampling distributions of the mean, УЕ, and the 
variance (or standard deviation) are the most important. Fora 
normal distribution is characterized by its mean and standard 
deviation and these are best estimated by the above sampling 
distributions. 

It is also possible to estimate the mean of the population 
from the median of a sample, and in the absence of knowledge 
of the sample mean this would be the next best statistic. Asin 
the case of the mean, the sampling distribution of the median 


1 See Appendix to this chapter (p. 266). 
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: ee 
is normal, and its mean is the mean (= median) of the popu 


ә ls 

tion. The standard error of the median, however, equa 
"а error 

EL 9, which is about 11 times as large as the standard err 


of the mean. It is this greater sampling error that makes (5 
median less “efficient” than the mean in setting up uc e 
limits for the mean (— median) of the population. Pup s s 
confidence interval derived from the median will, for а pde 
confidence coefficient, be 11 times larger than that derived gon 
the mean and will therefore not give as good an estimate of 

population mean, or median. | t 

In instances where time and economy of effort are importan ) 
it is also possible to estimate the variance (or standard pada ates 
of a normal population from the sample range. It has it 
found possible to derive the mean range, the standard ee 
the range, and the upper and lower .001, -005, .010, .025, n d 
and .100 points of the sampling distribution of the range n 
samples of 2 to 20 cases.? These data are reproduced in Tab! d 
XIII of the Appendix. The unit for the table is the poem 
deviation of the population, and the table can thus be used 1 
estimate the population standard deviation from the sample 
mean. The procedure is discussed on pages 294—296. 

The sample statistics УВ: and B; are important as measures 
of departure from normality. If the population is normal, үл, 
sampling distributions of these statistics tend to normality ne 
increasing size of the sample. Their means are 0 and 3, an 
their standard deviations are approximately 4/6/N and 4/24/N, 
respectively. Hence, if а sample is large, departures from 


А Bi —3 
normality may be tested by treating Ув. а РЕ. аз 


V6/N VAN 
normally distributed variates. 


Because of the com 
means and standard d 
of ^/В: and B», R. A. 


plexity of the exact formulas for és 
eviations of the sampling distribution 


H р 
Fisher? suggests the use of certain “Ё 
! The range is the difference between the largest and smallest cases in & 
sample. 
“ын, Tippett, Е. S. Pear: 
responsible for 
Appendix. 


* Statistical Methods for Research И’ 


son, and H. O. Hartley have been mainly 
deriving these data. See footnote to Table XIII in the 


orkers, Appendix, Chap. III. 
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statistics” in place of the moments of a sample distribution and 
certain “g statistics” in place of the sample В statistics. The k 
statistics! are defined as follows: 


hy = р Be 
E E 
EES 
PTE RE ETE. 
TEE pag 


N 
(N = 1(N — 2)(N =3) 


[w — 1(3 - 3 ae sy 


ka 


The g statistics bear the same relationship to the k statistics 
as the £ statistics do to the sample moments, viz., 
k ks 


du FE 


Sto 


distributions of the g statistics 


For large samples the sampling 
th means of zero and standard 


tenid to be normal in form, wi 
deviations equal exactly to 


e 6N(N — 1) 
n= (М = оу — ПМ — 3) 
24N(N — D* 


9. = бузуу — 2)(N -9UW-9 


of В: and 8» or gı and ge 


The use of the sampling distribution 
1 be diseussed in the next 


to test departures from normality wil 
chapter. 

the sampling distributions of ^/ В: and 8» 
» Pearsonian curves having 
ct sampling distributions. 


For small samples, 
have been approximated by "fitting 
the same moments as those of the exa 
For samples of 25 to 100, the 5 per cent and 10 per cent 
limits of the sampling distribution of A/ By have been determined 

1 The mean values of the k’s are the “eumulants” of the population. 
Cf. pp. 83-84. 


244 ELEMENTARY THEORY OF RANDOM SAMPLING 
approximately by P. Williams. These are reproduced in 
Table X of the Appendix.? -— 
It is interesting to compare the limits of this table with kr ў 
obtained by the assumption that 4/8; has a normal sampling o 
tribution with a mean of zero and a standard deviation of v 6/ b : 
For example, for samples of 100, the 5 per cent point given. M 
the table is 0.389, while the assumption of a normal samplink 
distribution gives 0.403. This suggests that for samples О 


NI quo m А " d 
100 or more a normal sampling distribution gives fairly goo 
results. 


Approximate values for the u 


5 per cent points of the sa. 
been computed? 


pper and lower 1 per cent and 
mpling distribution of В» have 885 
and are reproduced in Table ХІ of the Appendix. 
By comparison, the 5 per cent points derived from the assump- 
tion that 8» is normally distributed with а mean of 3 and i 
standard deviation of 4/ 24/N are 2.19 and 3.81 for samples © 
100 and 2.64 and 3.36 for samples of 500. The table values 
are 2.35 and 3.77 for samples of 100 and 2.57 and 3.37 for samples 
of 500, whieh shows that for samples of 500 or more the "EE 
tion of à normal sampling distribution gives results that wou 
appear to be fairly reliable. For samples of 100, the assumption 
of a normal sampling distribution gives a lower limit that is 
somewhat below that given by the table. 

For practical use, Tables X and XI of the Appendix have 


been put in chart form. This permits ready interpolation for 
values not listed in the tables. 


To test departure from normality with respect to kurtosis, 
tables have also been 


constructed for the sampling distribution 
of the statistic 


а = £D. (calculated from the mean) 
g 


Whereas the Sampling distribution of 8» is quite skewed for 
N < 200 and the accuracy of the tabled probability levels for 
1 “Note on the Sampling Distribution of Bi, where the Population Is 
Normal,” Biometrika, Vol. 27 (1935), pp. 269-271. 
* See р. 480. m 
3 Pearson, Е. S., “A Further Development of Tests for Normality; 
Biometrika, Vol. 22 (1930-1931), pp. 239-249. 
* See p. 480. 


* Geary, В. С., and E, 8. Pearson, Tests of Normality, рр. 13-15. 
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B» cannot be determined without further investigation, the 
approximation involved in the caleulation of probability levels 
for a is very close, even when N is as small as 10.* For this 
reason the use of a in place of В» to test existence of kurtosis is 
recommended. Probability levels for a have been worked out! 
and are given in Table XII of the Appendix. The use of this 
table will be illustrated in the next chapter. 


SUMMARY 


| The fundamental basis of any sampling analysis consists 
in а comparison of a sample with the set of all possible samples 
that may be derived from some hypothetical population. The 
basis of the comparison will vary from problem to problem. The 
present chapter, which is concerned with sampling from a normal 
population, discussed four principal sample measurements that 
might be used to make this comparison. These are the mean of 
the sample, the variance and standard deviation of the sample, 
and the sample statistic SHE =, 

When all possible samples from a normal population are 
considered with reference to their mean values, it is found that 
the set of sample means forms а normal frequency distribution 
the mean of which is the mean of the population and the variance 
of which is the variance of the population divided by N. This 
distribution of the sample means is called the “‘sampling distri- 
bution of the mean.” Tt is to be noted that its shape and position 
depend on the values of the population mean and variance. 

When the set of all possible samples is described in terms 
of the variances of the samples, it is found that the set of sample 
variances make up a skewed frequency distribution, which, if 
6?/N is taken as the unit of measurement, is of the form of the 
x? distribution, with n in the x? formula equal to N — 1. This 
is the “sampling distribution of the variance." The shape 
and position of this distribution depends only on the value of 
the population variance and is independent of the value of the 
population mean. If the sample is large, зау 30 or more, the 


* See ibid., р. 2. c 
1 Geary, В. C., “Moments of the Ratio of the Mean Deviation to the 


Standard Deviation for Normal Samples," Biometrika, Vol. 28 (1936), pp. 
295-307. х 
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; { = in form, | 
sampling distribution of the variance is almost normal in 


the mean of the distribution being approximately Su E 

variance of the population and the variance of the 1 3- 

being equal to the variance of the population times 2/N Po of 
When the set of all possible samples is described in terms 


-— N(X—-X).. nd 
the sample values of the statistic MEE ^, it is fou 


Е . hich 
that these sample values make up a f requency distribution Е 
is of the form of the ¢ distribution, with n equal to N — 1. 


us VN (t - X. 
is the "sampling distribution” of the statistic vies 


Since the form and sh 


ape of the { distribution depend only 
on the value of n (her 


€ equal to N — 1), it follows that the 
Х-®.. the 
sampling distribution of EE is independent of 
values of the population mean 
say 30 or more, the ¢ distributi 
of zero and a standard dey: 


and variance. For large samples, 
on is almost normal, with a mean 
lation of approximately unity. " 
Attention was also called to the sampling distributions of suc 
Statistics as the median and the range, 8, and 8», gi and g» 
and A.D. 7, 


APPENDIX А 
It is the Purpose of this appendix to derive the sampling 
N (X — X) a 
distributions of the mean, variance, and i P 
sample of 2 сазез and 


а sample of N cases from a normal pU 
tion. The argument will be the same as in the text but W 


: DN d 
¢ and hence general instead of arithmetical an 
particular, 


SPECIAL CASE N = 2 


ИЧЧИ 1 

Distribution of АП Possible Samples of Two from a Norma 
Population. Derivation. Let the first case in a sample be герт Е 
sented by X, and the second by Xə. Since the population 1 


normal, the Probability of X, lying in the interval X, to X, + dXı 
is 


P(E) = — exp | - cad m 
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and the probability of X; lying in the interval X» to X: + ах» is 


ALL LT. Gr XX] ag 
dP(X2) = "Es exp | agi | dX: (2) 


If samples of two are drawn at random from the population, 
the law of large numbers suggests that the relative frequency 
with which samples will have one case in the interval X; to 
Х, + dX; and the other in the interval Хз to X; + dX: may be 
predicted by the joint probability of X, and Xs, that is, by the 
product of Eqs. (1) and (2); thus 
dP(X,X;) = E: _exp | Еа pisi Luo Er] ахах. (8) 
Equation (3) is the mathematical formula for the distribution of 
all possible samples of two from a normal population. It is the 
model that any large number of actual random samples of two 
Will tend to approximate. 


Geometrical Properties. И values of X; and X» for any sample 


are taken as the coordinates of a point in a plane, then the set of 
all possible samples will be represented by a cluster of points in 
this plane; and their distribution over the plane will be given by 


equation (3). The factor dX „АХ, of this equation will represent 


the area of the plane containing the specified sample values and 
1 Qu — Xy 0 = 23 
627 exp | = 26° 
Will represent the “density ” of sample points in this area. Since 
the density factor increases аѕ X; and Xs approach X, it follows 
that, the greatest concentration of sample points is in the 
immediate neighborhood of the point X,X. Furthermore, since 
(X,— H+ He - Xy 
ce from X,X it follows that all 
t from X,X will have the same 
ty decreases as the distance 


Measures the square of the distan 
sections of the plane equally distan 
density of points and that the densi 
from X,X increases. The distribution of all samples of two thus 
has a circular symmetry about the central point, X,X. 

The point X,X, it will be noted, lies on the line through the 
origin that bisects the angle between the axes. This line (line CD 
of Fig. 64, or Figs. 65a and 656, if projected to pass through the 
point where X; = 0 and X» = 0) is the locus of all samples that 


7 Y 7 
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have two equal values and thus has the equation X, = Хз 0r 
X= е -— 
ES the mean of any sample (N — 2) is (X, + nme 
But this is the equation of a line that cuts the X ; axis at E 
the X» axis at 2X, and is perpendicular to the line CD, mare 
that line at X,X. Such a line is line AB of Fig. 64. Allp "d 
on line AB have the same mean. It then follows fhat the e * 
of any sample can be found by finding where the line ишо e 
perpendieular to CD, cuts CD. Hence variation in mean v CD. 
from sample to sample can be represented by variation pese 
. Also, since the variance of a sample X,X, is by definiti 


o = [GG — Xy? + (X: — X32 
and since (X, — Xy + (x, — X)? represents the square t 
distance from the sample point X,,X, to the mean point riis 
follows that all sample points equidistant from their mean P nhé 
have the same variances. But all mean points X,X lie on 
line CD; hence all points equidistant from line CD have the ee 
variance." Loci of equal variances are accordingly lines, like m A 
EF and ЕТ" of Fig. 64, that are parallel to line CD. Poe M 
in sample variances is thus measured by variation perpendicula 
to these lines, д ne 
Distribution of All Possible Samples in Terms of Their "stri 
and Variances. From the foregoing analysis, it follows that th i 
plane of Fig. 64 can be divided into cells indicating variation 1 


X and in e? or c, in lieu of cells indicating variation in X; and X2 
This might be done as follows: ME 

Select any dX, interval on the X 1 axis and draw perpendicular 
lines through ea, 


ch end of this interval; likewise select any es 
2 axis (equal to dX 1) and draw lines throug! 

"pendieular to the X, axis. Let these two pair d 
ark out the cell abed of Fig. 70. The top ani 
bottom of this cell will be dX, and the sides dX,. "This cell е 
like cell абса in Fig. 63 on page 222. Next draw lines through 
а and c perpendicular to CD. These will mark off an interval on 
(Xi + X3)/2, 


dX = (dX, + dX3)/2, 


and because dX, — dX; dX = dX, or dX, Hence, in Fig. 70, 
1 See p. 247. 
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fg = ab 4/2 = dX: у = dX 4/2. Finally, draw lines through 
b and d parallel to CD and call the perpendicular distance between 
them de 4/2; the variance, it will be recalled from page 227; 
equals one-half the square of the distance from CD. 
From the cell abcd there has thus been derived the new cell 
efgh of size dX 4/2 de 4/2 = 2dXde. When the whole of the 
D 


Fio. 70. 


plane has been eut up into cells like efgh, the plane will become а 


grid based on X and e intervals instead of X: and X» intervals. 
In Eq. (3) the factor that measures the variation in density of 
Sample points from one part of the plane to the other is the 
1 (Ху — Ж)? + mc 
quantit; x . But 
ity — quy exp [ 25 u the 
numerator of the exponential can be evà 
_ | For Ng? = zd? — NCP, in which d = X;-Xu 
is merely a version of the “short formula” for finding 
In this application, it is to be remembered, № = 2. 


luated as follows:! 


nd C= – Х. This 
the standard deviation. 
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0 - S* + (X: - Xy = (X, — Xy + (x — KH? 200 0)" 
= 20° + 2(F — Xy 


"This indicates that it is possible also to measure the variation in 
density of sample points in terms of X and c. А 

This now permits the complete expression of the distribution 2 
all samples in terms of X and c. For from the foregoing ! 
follows that the probability of a sample having a mean lying 
between X and X + dX and a o lying between с and с + do P 
equal to the density factor for the areas containing the specifie . 
values of X and c times the size of the area. The density factor 
is Е v exp [= ER = 8 | The area factor is double 
the area of square efgh in Fig. 70 3%, 
The doubling of the area of efgh res 
is another square, e’f’g/h’, 


-€., double 2d Xde or 4dXdo. 
ults from the fact that tie 
of the same size on the opposite side к 
line CD that also contains samples having the same mean ap 

standard deviation as samples in square efgh. Since probability 
equals density factor times area, it follows that the probability о 


a sample having an X between Y and X + dX and ao between © 
and с + de is equal to 


2 1 2 К ти E 
dP(X,c) = Hp ехр [- m xd 4d Xde 
=~ -ep[-2X-3X*],, 2 p|- 2] 

Мт 8/73 P | (967 ‘| — vati se (4) 
or, since Зоо = de? 

EN. 1 XX-X»|. 3 
РОХ) = mE = ех [- Е 
í Vîr e/ 7/3 P 29? e св \/2r 


Equation (4^) is the equation for thi 
points in terms of thei 


ll Sample Means. Equation (4) gives the 
sample falling in either Square efgh or square 
ef'g'h' of Fig. 70, It is the probability of a sample having & 
mean between X and X + AF and a standard deviation between 
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c ande + de. To find the probability of a sample having a mean 
between X and X + dX and a standard deviation of any value, 
it is merely necessary, by the addition theorem, to sum probabili- 
ties like Eq. (4) for all values of the standard deviation. This 
can be done with reference to Fig. 64 by summing up the prob- 
abilities of all cells lying between АВ and A'B'. Algebraically 
this sum is found by integrating Eq. (4) with respect to c. 'Thus 


1 „= = TE 8a? 


dP(X) = ex [- 2 |а 
Vin ova 9 35: в Зы 
20° 
ехр Е =] de 


But the integral is of the form 


“N & 
2 e? dz 
E 


where z = 4/2 5/6 and is equal to 1, since it is double one-half 
the area under the standard normal curve. Hence 


s 1 ااا‎ ax — =] 1X 

dP(X) = Vis aj exp | ТЕ а 
Е 1 а [е = E iX 6 
E exp | = |. 5) 


in which в; = в/\/3. 
nal population, the distribution of 


This shows that, for a norm 1 
means of samples of two is normal in form with a mean equal to 
the mean of the population and a variance equal to the variance 


of the population divided by two. 
Distribution of All Sample V ariances. 

be used to find the distribution of all sam 

instance, the probabilities of all cells lying between lines GH and 

EF and lines G'H' and E'F' are summed. Algebraically the 

oo is found by integrating (4) with reference to X. 
us 


2 s 1 
арР (оз) = E exp |- Я do? - ] 754. 
Е -o ут; 


ا 
exp үе x ) [ax‏ 


The same process may 
ple variances. In this 
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а па 
But the integral gives the total area under the normal curve а 
thus equals 1. Hence 


1 20° 2 (6) 
dP(c?) = = m exp [- 3 | do 


1! 
If x? is set equal to 26?/8?, and it is noted that Vr = (=$)! 
this becomes 


—xX2/2(.,2)—14 
dP Ge) = PON gye 


V2 (3)! 
ation 
which shows that for samples of two from a normal populati 


2 distribu- 
the sampling distribution of 2c*/8? has the form of a х distrib 
tion with n = 2—1 = 1.1 


2(X — X) о 

Distribution of All Sample Values of t= = T 
2(X Ж) gas 

find the distribution of sample values of { = сз = ан 


note that by definition ¢ = T Án ү jorwhenN = 2,6 = e 2. 


Hence for N = Df ee 


; P = (X — X)? and edt = dX. 
g 
Substituting these values in (4^) gives 


dP (tc) = VER QUO [- ze] dt- М? exp [- a do? 


6? có 
Combining terms yields 


dP(te) = Tl exp [- 2 (1+ e do? 


Which may be put in the form 


dP (tc) = 


a с E 2 t? | 
zü Ty ee [= +» |a [¢ a =#) 
То get the distribution of t, 


integrate this for all values of c? 
from 0 to ©. Thus, 


КА dt p =e " 
dP(t) ^ asp |, e ” dy, where y = =. (1 + 0) 


! Cf. equation for the x? curve, page 111. 
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But the integral equals -e| which equals 1. Hence 
di 
BU" gen 
which is seen to be of the form of the ¢ distribution with 
в=мМ--1=2=1=1. 


Hence V AX = X) jg distributed like ¢ with = 1.1 


аР 


THE GENERAL CASE N > 2 
al case, it will be well to prepare 


Before proceeding to the gener 
] mathematical functions and 


the ground by reviewing severa 
relationships. 


N-dimensional Geometry. Those who have had coordinate 


geometry will remember that a linear equation in two variables 
May be represented by а straight line drawn on the coordinate 
Plane. If the equation is put in the form aX, + bX: + c = 0, 
—c/b is the X» intercept, —c/a is the X: intercept, and —a/b is 
the slope of the line with the X axis. The “direction ratios” 
of the line, i.e., factors that are proportional to the cosines of the 
angles the line makes with the Xr and X-axis, are —a and b. 
Thus a line that is perpendicular to aX: + bX: + = 0 would be 
One whose slope equals +b/a and whose direction ratios are a 
and b. In particular, a line through the origin that made equal 
angles with the axes would be X; — X: = 0, and a line perpen- 
dicular to this line would be X1 + X: = с. Similarly, in three 
dimensions, а linear equation of the form 


aX, + DX: + dXs + ¢ = 0 


‚ d are the direction ratios of a line 


Tepresents а plane, and a, b з 
If this is generalized, it may be said 


Perpendicular to the plane. 
that a linear equation of the form 


aX, LORE FF +< hXgd 0 


represents a hyperplane, or flat space, in N dimensions and 
а, b, .. . , k are the direction ratios of a line perpendicular 
to this hyperplane. Just аз а line in а two-dimensional plane 
is said to have one dimension (i.e., length) and a plane in three 


1 Cf. equation for the ( distribution, p. 111. 
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dimensions is said to have two dimensions (ie., yc ыш 
breadth), a hyperplane їп N dimensions is said to have Mie 
dimensions. It is to be noted that this use of pua 
Space is merely a convenient way of generalizing certain ci 
relationships that have concrete counterparts in two an 
dimensions. There can be no real N-dimensional ipe WE 
This generalization of two- and three-dimensional asa 
Ships may be extended to other figures. Thus, in two к 
sions, (X; — a)? + (X, — b)? = т? is the equation for a e 
with center at (a,b) and. radius equal to т; in three rose. 
(X1 — a)? + (X, — b)? + (Хз — о) = risa врһеге with ce ud 
at (a,b,c) and a radius equal to r. If these relationships 
generalized, it may be said that 


(Xi = а)#-„(Х,— +... + (x, aur 
represents a h 


4 гаф 
ypersphere in N. -dimensional space with center 
(Qs c.l) 


and radius equal to r. The circumference а 
circle is of one dimension (being only a line), the surface d 
Sphere is of two dimensions, and the surface of а hypersph 
in N dimensions will be of — 1 dimensions. These N-dime 
Sional notions will be useful in the analysis that follows. 


The Gamma Function. The integral 
7 
T(m) = Ј, eam- dg, т> 0 (0) 
‘gamma function." 
on the right equals 
• И (8) 
[resents + “ет тае dz 


But the first term of E 
t = c and th 
Hence we ha 


is known as the * 


VS 
Integration by parts show 
that the integral 


теп 

Ч. (8) equals 0 when z = 0 and also w E 
€ second term is seen to be equal to (m — 1) (m — 

ve the relationship 

T(m) = (m — 1)r(m — 1) 9 

By Eq. (9), T(m — 1) = 
is a positive integer,! 

Г(т) = 


(т — 2)Г(т — 2), ete.; therefore, if mM 


(m — Don — 2)(m — 3) +++ 1 = On — 1)! (0) 


1 The last term will be Г(1) = ps e*dz = [=e] =е=1. 


0 
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Equation (10) holds for positive integral values of m. For 
fractional values of т, it is taken as the definition of т! Thus, 


in general, m! is defined by 
m! = Г(т + 1) (107) 


In statistical theory, T (+) is of particular interest; for if m is 
set equal to 4 in Eq. (7), it becomes 


г@) = Ja eta} dz 


By setting x = y?/2k?, this may be put in the form 


kj 
MESS 
cm 
че" 
ll 
S 
8 
s 
Ge 
< 
S 
MTS 
Nie 
ан 


у? 


[BEA җа =F | e 2 dy 
0 y k 0 


If the right member is multiplied by 9r / ^/ 2r, this becomes 


тя ee 7 (11) 
r (i) = 27 f, ev y 


as the sum of half the area of 
al to 4. Hence, 


But the integral is recognized as. 
& normal frequency curve and is thus equ 


Eq. (11) reduces to! 
rq) vs 02) 


In conclusion, it may be noted that 


i eam dg 
Р t 


is known as the “incomplete gamma function (“incomplete ” 
because the integral is 0 to x instead of 0 to ®).” Tables of this 
function have been computed by Karl Pearson and his staff and 
have been published by the Cambridge U 

Distribution of All Possible Samples. Derivation. The first 
step in deriving the sampling distribution of means, variances, 


niversity Press. 


1 The integral |. ети dy can be evaluated without making use of knowl- 


edge of the normal curve. Веб, for example, John F. Kenney, Mathematics 


of Statistics (1939), Part II, pp. 35-37, 
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А N 
i i ivati ll possible samples of 
., is to obtain the derivation of a t ade: 
bul a normal population. This may be accomplished by direc 
application of the multiplieation theorem. 


The probability of а case lying between X; and X; + dX, i$ 


x Day 8 
P(X) = ge ex | € ех (13) 


Similarly, the probability of a case lying between X; and X» + dX: 


is А 
= үер... Р 
LL [= = 8) |е (3^) 


nig e ó 26° 


and the same sort of equation holds for the probability of Xs 
Xa, „++, Хх. А р al 

И samples of № are drawn at random from an infinite norm А 
population, the law of large numbers suggests that the acd 
frequency, or probability, of a sample in which one case пе 
between X; and X; + dX, another between X;and Х + dX» 9 
third between X; and X; + dX;, etc., will be predicted by the 
joint probability of X;, Xy Хь..., Хь, that is, by 


dP(Xi Xn +++ , Xx) = 


s exp | 22 — Жу: wae dE) UD 
(ê VR)" PP [295 | exoexo ам 


Equation (14) is the equation for the distribution of all pom 
samples of N from a normal population and is the model tha 


any large number of samples of N will tend to approximate. 
Geometrical Ве 


presentation and Properties. If the sample 
values of X, Xo, .. , , Xx are represented by a point in N-dimen 
sional space with co 


ordinates Xy, Xs, . . . , Xy, then Ед. (14) 
represents a cluster of sam 


= ple points with center at X, X, X; · · ° 
X (see Fig. Tla). 


Geometrically Eq. (14) says that in any N dimensional cell 
whose sides extend from Xi to X, +dX,, X: to X:+ dX» 
77 Хы Хи + dX, the density of the sample is given by 


1 
(6 Vm "P | 987 
and the probability of а sam 
product of this density 


_ X(X— 24 


ple falling in this cell is equal to i 
factor times the volume of the cell, 20 
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` А : © 
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dX, dX» dX3...dXy. From this interpretation it x 
that the density of sample points is the greatest on s Ше 
. + + , Xx all equal X, that is, when Kyky a ap Xyal ы m ai 
center of the cluster. Since У(Х, — X)? equals the square Ў, 
distance of а sample point from the center point X, № : bs et 
Еч. (14) also shows that the density of sample points is © M 
for all cells lying on a hypersphere with center at X, X,. - 

and radius equal to МС, — X». 


Fio. 72. 


The center point Ca NM 
through the origin whose coor 
identical values lie on this li 


- , X lies on the line OM , Fig. d 
dinates are equal. АП samples wl 
ne. The mean of any sample is 


Bt Bah Mees se dX, x 
N 


For a given value of X this defines a hyperplane that is gen 
dicular to the line OM since its direction ratios are all ү 
In other words, all samples that have the same mean lie on a pla т 
that is perpendicular to line OM and cut, this line in the гир: а 
point X, Х,... , X. It follows that variation in the mean о 
sample can be represented by variation along the line OM. 
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z(x;— Xy 
The variance of a sample is o? = ng Жу. Thus №"? 


represents the square of the distance of a sample point from the 
point lying on OM whose coordinates are all equal to the mean 
of the sample. All samples with the same variance thus lie on 
а hypercylinder whose axis is the line OM. It follows that 
variation in the variance of a sample can be represented by varia- 
tion in the square of the perpendicular distance of a sample point 
from the line OM. 

Distribution of All Possible Samples in Terms of Their Means 
and Variances. It is suggested by the foregoing analysis that 
the distribution of all possible samples may be given in terms 


of the means and variances of the samples provided that the 


proper element of volume is found. As indicated above, the 


probability of a sample is the same for all “cubical cells” lying 
on the surface of a hypersphere, the probability being propor- 
tional to exp momen If У(Х, — Ж)? is set equal to 
6°, this means that the density of sample points is constant 
throughout a shell with center at X,X,...,X, radius equal to 


S, and thickness equal to dS. But’ 
У(Х; -X) = X(X: — X) + N(X - Ж)? 
= No? + N(X — X)? 
Hence the probability of samples for all cells lying in a given 
shell is proportional to 


No? -N(X-X* 
exp | — 237 exp 29: 


that is, to the product of a function depending on the variance of 
à sample by a function depending on the mean of the sample. И 
the shell in question is cut by а hyperplane through the point 
X; Х,..., X on OM and perpendicular to OM and another 
through the point X + dX, X + D MEO n dX and also 
perpendicular to OM, these planes will cut off a shell of one less 


ion for obtaining the sum of the squares 
nding the sum of the squares 
n of the population) 


1 This is merely а "short" equat 
of the deviations from the mean of a sample by fi 
of the deviations from an arbitrary origin (here the mea 
and the difference between the sample mean and this origin. 
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i i on- 
dimension! throughout which the density of sample points c 
tinues to be proportional to 


No? —N(X — Xy 
XD |- as | exp [n 29: 
Within an error dependent upon infinitesimals of Higher ordei, 
this second shell may be approximated by a “rectangula 


o р {ат 
Whose inner radius is У(Х; — Х): = VN e, whose er 
radius is VN (с + 4), and whose width is dX (see Fig. Й 


Fic. 73, 


B, С). The 


Proportionate fre 
shell wil] be a 


А Р а 
quency of points in this secon 
Pproximated by 


the product of the density factor 


” 
: “тї 
ensions, the first shell is a true shell, the second is a “ring 
Pproximately equal to dF 


` and thickness equal to do (see Fig 
‚ C). 

2 The argument is thou; 
N dimensions, The di 
Figure 744 Shows half h 
approximation, Figur 
practically the same, а 


ee dimensions but is generalized © 
* the three-dimensional SS in 
Figure 74B shows the rectang ad 
he areas of the eross sections 
Shapes, 


| 
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times the volume of the shell, or some factor proportional to 
this volume. This relative frequency will represent the relative 
frequency, or probability, of all points whose means lie between 
X and X + dX and whose standard deviations lie between c 
and c + de. It will thus give an expression for the distribution 
of all possible samples in terms of their means and standard 
deviations. | 

Since the radius of the shell in question is proportional to c, 
its volume will be proportional to (z)*7? de dX. For а hyper- 
plane of N dimensions will intersect an N-dimensional hypersphere 
in another hypersphere of N — 1 dimensions (e.g., а plane will 
intersect а sphere in a circle), and the generalized surface area 


(С, 


Fie. 74. 

of the latter will be proportional to the radius raised to the 
(N — 2)th power. For example, the surface of a three-dimen- 
sional sphere is proportional to 7? and the circumference of a 
circle to г. The total volume will equal this surface area multi- 
plied by the width of the shell dX and its thickness йо. Multi- 
plying the volume of the shell by the density factor thus gives 
the relative frequency of the samples lying in the shell as pro- 
portional to — 

oX—2 exp ES | exp [ im 4 do dX 


But do? = 2e de or de = e therefore, the above may be 
С 


Written 


N=8 Nod МО — Ж | пех 
2 3 exp | aL Jew [= 2p do? dX 
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Thus it is to be concluded that the relative frequency of a sample 
having à mean lying between X and X + dX and a variance 
lying between c? and о? + do? is 


Y 3x N-s à 
iP) = Кох] N CD Jatea s = ace (15) 
where K is some constant independent of с? and X. 

Distribution of All Sample Means. 
value, Eq. (15) gives the distribution 
gives the relative frequencies of sam 
values, but all the same а?, 
quencies are proportional to 


exp [= ах 
which is obviously the form of a nor; 
with mean at X а 


If c? is given а particular 
of sample means; t.e., it 
ples with different, mean 
It shows that these relative fre- 


It also shows 
ame no matter what 
In other words, the distribution of 


mean values will be given 


#Р(Ж = кар [Sa] ax 


29*/N 
To obtain К, integrate the expression above from — оо to +% 
and set it equal to 1, the total frequency, Thus 


in general by 


E [=F у _, 


Р Х- И X 
or setting y — "a; and noting that dy = . ЯХ 


в/у 
и 
в 4/27 uE Т 
Ук CEE 
N Son yee 
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1 AT e Жүй as 
uae exp | AUN Е (16) 
which is the distribution of all sample means. 

Distribution of All Sample Variances. If X is given a particu- 
lar value, Eq. (15) gives the distribution of sample variances, 
i.e., the relative frequencies of samples with different c?'s, but all 
the same X’s. It shows that these relative frequencies are 
proportional to 


dP(X) 


N-3 угз 
(o?) ? exp | = |4 


and thus indicates that the distribution of sample variances is 
independent of the value of the sample mean. Hence, if the 
relative frequencies are summed for all sample mean values, the 
relative frequencies of various variances will be given in general 
by 

N-3 


dP(c?) = К.(о°) ? exp ЕУ do? 


To obtain Ko, integrate this expression from 0 to +% and set it 
equal to 1, the total frequency. Thus 


ee NSS —No?|,, 
f, Kis?) 3 exp | gg | do? = 1 


But if No®/6? is set equal to x?, this may be put in the form 


-1 N-1 xm 
(39)? 23 Kj [f° -£ (XN (x 
On f e 5 2 а a) 1 


H ; therefore, 


But the integral equals T (£ y 


and 


z zn 
Рз) = r Oy on [SE] tot an 
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If No?/6? is set equal to x2, this takes the form of a x? distribu- 
tion with n — N — 1. Since de? = 2¢ de, the distribution of 
c is 


4Р(с) = -x— 7-1 xp 
237p (2 w ) (6?) 2 


N-i 
N 2 N—2 — №? 
(с) | 2 ] ds (18) 


2 


Distribution of Sample Values of VNR- Since the 
ГД 


distribution of sample variances is independent of the distribu- 
tion of sample means, and vice versa, their joint distribution is 
equal to the product of their individual distributions, shown by 


Eqs. (16) and (17). The exaet form for Eq. (15) is thus given 
by the product of Eqs. (16) and (17), or 


dP(X c?) Ex | ехр [== 2d dX 


—No? [ex = Xy a 1 
28 ^" | de*dX (15) 


Tis а АИР A 
Istribution of -—À may be obtained from Eq. 


(157) as follows: First set the Symbol ¢ = VN (X — X) 
č 


- and note 
pg. N N—i(Xx-g 
that = с үг Therefore, { = YW AT (2 ~ 8) while 
б MAY — lag 
= : 


Substituting NA for (XY — y: 


and — 7 у for dX in 
Eq. (157) yields VT 
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dP (t,0") 


which may be put in the form 


N N-2 
P(e) = et 
ЕСЕ )e* 


exp jt pis (: + у= 29] do? dt 


Further manipulation shows that the following is the equivalent 
of the foregoing: 


2 2 N-2 2 2 
tas [35 ( +) ev| - 57 (1+ xà) 
a[i mq Jl _ а ; 

=i yi ver(* s) (i EA 


2 


Here { is taken as a constant in d [55 ap + ў yx If the 


foregoing is summed or integrated for û? for a given value of £, 
the probability of that ¢ is given by 


dP (t) = dt 
TENCE, 


= No? e 2 No? Е 
à (+) * مه‎ ] E (1 +9754) 


N-2 
But the integral is of the form 1, 2 e-xX 2 dX, which equals 


N 
Г (3) Hence, the distribution of t is 
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T (3) dt 
dP (t) Wt 33 
NT = P2 
Vivere +) 
ог if n is set equal to N — D 
г (Е T ) di 
dP(t) — 
5 М r(3)(1 gs Ü ыт 
2 n 


which is recognized as the for: 


VN (X -X 


m of the ¢ distribution. Hence 


is distributed like ¢ with п = № – 1. 


CHAPTER XI 


SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 
IL USES OF THE SAMPLING DISTRIBUTIONS 


SAMPLING DISTRIBUTION OF THE MEAN 


Used to Test а Hypothesis. Problem When Population Vari- 
ance Is Known. The practical use of the sampling distributions 
discussed in the preceding chapter may be explained by reference 
to several examples. Consider first an illustration of how the 
sampling distribution of the mean is used to test а hypothesis. 
Suppose it is claimed that а new process of manufacturing 
electric light bulbs will increase their mean length of life to 
1,100 kilowatt-hours. А manufacturer tries out this new process 
and produces a sample batch of 100 bulbs for which the mean 
length of life proves to be 1,090. The new process, it may be 
Supposed, does not change the variability in length of life from 
bulb to bulb, and therefore the standard deviation in length of 
life may be taken as that of the old process. Suppose this is 
known to be 200 kilowatt-hours. The question to which the 
manufacturer seeks an answer is therefore this: Is it reasonable 
to infer that the given sample is from a population in which the 
mean is 1,100 kilowatt-hours? This is the question that will 
now be discussed in some detail. 

Coefficient of Risk. Before answering the question the manu- 
facturer must first determine what degree of risk he is willing 
to undergo in rejecting the hypothesis when it is true. Suppose 
he is willing to make this mistake once in twenty times on th 
average; his coefficient of risk will then be .05. z 

Region of Rejection. The next step is to select a “region: of 
rejection,” which on the basis of the given hypothesis will mark 
off the unreasonable or unacceptable samples from the others, 
the probability of this subset of samples being just .05. Since 
the sampling distribution of the mean is normal in form, -the 
argument is essentially the same as that pertaining to percent- 
ages (see Chap. IX). 
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If the manufacturer is indifferent as to what mean value 
other than the hypothetical mean value is the true one, he will 
do best to distribute his .05 region of rejection equally at both 
ends of the sampling distribution. If he wishes to reject the 
hypothesis more often when the true mean value is actually 
below the hypothetical mean value than when it is above this 
value, he will do best to put the .05 region entirely at the lower 
end of the distribution. If he wishes to reject the hypothesis 
more often when the true mean value is actually above the 
hypothetical mean value than when it i 
will do best to put his .05 region entirely 
sampling distribution. 

In the present instance, a region of rejection lying entirely 
at the lower end of the distribution would appear to be the best 
region to adopt; for the manufacturer would not care if the 
claims of the new process were more than substantiated. He 
would care only if they fell short of being true. 


s below this value, he 
at the upper end of the 


ation of the population divided by 
ction lying entirely at the lower end 


onsist of all samples whose means lay 
below 1,100 (the hypothetical population mean) minus 


200 
(1.645) 200 _ 
т 
that is, below 1,100 — 329 = 1067.1. 


А graph of the sampling distribution of the mean and this 
region of rejection is s 


Other Examples. Other cases might be conceived of in which 
ап upper region of rejection or an evenly distributed region 
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of rejection would be the proper region to employ. For example, 
if a process of cigarette manufacture is claimed to yield cigarettes 
of a low burning temperature, an upper region of rejection would 
be appropriate. Or a manufacturer of automobile tires may 
wish to copy the tires of a competitor as closely as possible, and 
in this case he might be equally eager to avoid turning out tires 
that were markedly worse or markedly better than the competing 
brand; here an equally distributed region of rejection would be 
in order. 


NS E 
1067.1 1100 5 


Confidence Limits for the Population Mean. Problem When 
the Population Variance Is Known. In many cases it is the aim 
of the statistical investigation to determine confidence limits 
Гог the mean of the population from which a sample has been 
drawn rather than to test any particular hypothesis. Again 
this may be done in a manner that is very similar to the process 
of determining confidence limits for percentage figures. The 
following discussion will relate to the electric-light bulb example 
described in the preceding section. As before, it will be assumed 
that the population variance is known. nx 

The Confidence Coefficient. Confidence limits, № will be 
recalled, are the end points of a certain range of values that may 
be said to include the true value with a given degree of probabil- 
ity. This degree of probability is called the "confidence coeffi- 
cient” associated with the given confidence interval.” Suppose 
in the present instance that the manufacturer of electric-light 
bulbs adopts a confidence coefficient of .95; that is, he wishes 
to be able to say that there is a probability of .95 that the 
confidence interval he establishes covers the true mean value. 

The Confidence Interval. Аз previously pointed out, innumer- 
able confidence intervals may be set up that will all have a 
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confidence coefficient of .95. Suppose the manufacturer is as 
much concerned with underestimating the true mean value as 
with overestimating it. Then he will proceed as follows: 

He will select à lower value such that if it is the true mean 
value the probability of getting the given sample mean or some 
higher value will be just .025. The process is illustrated ш 
Fig. 764. Next he will select some upper value such that if it 
is the true value the probability of getting the given sample 


A-Locafing 
lower bound 


deum 
\ | 
| | 
B-Locating | | 
upper bound | | 
| 
ae 
| <$ 
| 1090 1129.2 x 
| | с 
| elie | 
1050.8 1090 1129.2 
C= Showing whole 
confidence interval 
Fic. 76. 


mean or some lower value will be just .025. 
in Fig. 76B. 

The two values so deter 
confidence limits for the 


This is illustrated 


stribution of the mean, that is, 
of the mean, as it is called. 


true of à sample value falling 
у 1.968. Since the standard 


rd deviation of the 
population, divided by the square root of N, the confidence 
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limits for the population mean may be found in this instance 
by simply laying off +1.96 т from the sample mean value. 
For the given problem, these limits would be equal to 


200 
1,090 + 1.96 ——— 
ОО, 


ог 1,129.2 and 1,050.8 kilowatt-hours. This interval is sym- 
metrieal with respect to the sample mean value and does not 
tend to overestimate the true value more than it tends to under- 
estimate it. 


A=Locating 
Upper 
bound 


d 
«S 
1090 1229 
1 


سے 
о 11229‏ 


B Showing the 
confidence interval 
Fig. 77. 


сн 


If the manufacturer is concerned solely with the possibility 
of overestimating the true mean value, he may desire only an 
upper confidence limit. ‘The confidence interval will presumably 
run from zero to this upper bound. If the upper bound is 
selected as that value such that if it is the true mean value the 
probability of getting the sample mean value or a lower value is 
just .05, then this confidence interval may be said to have a 
probability of .95 of covering the true mean value. Since the 
probability of a normally distributed variate falling short of 
its true mean value by 1.64586 is just .05, the upper limit of the 
confidence interval may be found by adding 1.645 times the 
Standard error of the mean to the sample mean value. This is 
ilustrated in Fig. 77. For the given problem, this yields 


200 
1,090 + 1.645 — — = 1,122.9. It may be said, then, that 


м 100 
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there is a probability of .95 that the range 0-1,122.9 covers the 
true mean value. 

If the manufacturer wished not to underestimate the true 
mean value, he would reverse the foregoing process. He would 
subtract 1.645 times the standard error of the mean from the 
sample mean value to obtain a lower bound for the true mean 
value. This is illustrated in Fig. 78 and would be 


200 
1,090 — 1.645 = 1,057.1. 
j 100 


He could then say that there was а probability of .95 that the 


range 1057.1-« covered the true mean value. 


A=Locating 
lower bound 


2= 
105.1 1090 
1 


ош 


— 
EL وا‎ 
1057.1 B- Showing the oo 


confidence interval 
Fic. 78. 


and N. It is to be note 


9 cover the true value if he is willing to 
Since the standard error 
viation of the population 
› it also is to be noted that, the 
dence limits to the sample 
ay of saying that, the larger the 
nfdence one can have that the 
in the immediate neighborhood 


is true only of the upper bound in Fig. 77 ава the lower 


size of the sample, the more со: 
true mean value is somewhere 


1 This of course, 
bound in Fig. 78. 
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of the sample mean value." These points should be carefully 
noted. 

Maximum-likelihood Estimate of the Population Mean. The 
Problem When Population Variance Is Known. In addition to 
setting up a range that might reasonably be presumed to include 
the population mean, the manufacturer of electric-light bulbs 
may wish to have a single estimate of the population mean that 
could be viewed аз the “best” estimate to be made of this 
population parameter. Аз pointed out in the preview of theory, 
one way of making such a single estimate is to take that value 
for the population mean that makes the probability of the sample 
mean а maximum.? Ап estimate so made is a maximum-like- 
lihood estimate. 

Since the sampling distribution of the mean is normal, it 
follows that the probability of a sample mean is the greatest 
When the sample mean has the same value as the population 
mean. If, therefore, in estimating the population mean from 
the sample mean, the former is taken equal to the latter, then the 
Probability of the given sample mean is a maximum. 

In the problem under discussion the mean of the sample of 
electric bulbs was 1,090 kilowatt-hours. The maximum-likeli- 


hood estimate of the population mean is therefore X — 1,090 


kilowatt-hours. = 
SAMPLING DISTRIBUTION or УУ -® = 
_ Used to Test a Hypothesis. The Problem When the Popula- 
tion Variance Is Unknown. The analysis of the preceding section 
Was based on the assumption that the variance of the population 
is known. If the variance of the population is not known and 
the hypothesis to be tested does not prescribe any value for this 
т . suu VN (X — X 
parameter, the sampling distribution of the statistic ыы = 
should be used.* 


ТА more precise statement is this: The larger the value of N, the greater 
the probability that a given finite confidence interval, however small, will 
Include the true value. 

* See р. 180. 


* For then P(X) would equal i 


DE 
have, xX Vr 


* "This is called a “composite hypothesis”; for it supposes that the sample 


‚ whieh is the highest value it can 
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For purposes of illustration suppose that the electric-light 
bulb example is modified as follows: An inventor offers a new 
process of manufacturing electric-light bulbs that will, he claims, 
increase the mean length of life to 1,100 kilowatt-hours. In the 
absence of any prolonged experience with the new process, the 
inventor makes no claims regarding the variability in length of 
life from bulb to bulb, and there is no reason a priori to believe 
that the variability of bulbs manufactured by the new process 
will be the same as that of bulbs manufactured by the 
old. The latter, it will be recalled, was the assumption of the 
preceding section. Furthermore, Suppose the manufacturer to 
whom the process is offered is interested, not in the variability 
in length of life of the bulbs, but only in their mean length of 
life. To test the claim of the inventor he manufactures a batch | 
of 10 bulbs by the new process and finds that the mean length 
of life of these 10 bulbs is 1,090 kilowatt-hours.! 

In view of this result, is it reasonable to accept the hypothesis 
that the new process will in general produce bulbs whose mean 
length of life is 1,100 kilowatt-hours? Note that nothing is 
said here about the variance in length of life. To put the ques- 
tion another way, is it reasonable to assume that this sample 
could have come from a population in which the mean length 
of life was 1,100 kilowatt-hours and the variance was any value 
whatever? 

Since the hypothesis does not prescribe any particular value 
for the variance of the population and its value is not known, 
the proper statistic to use in testing the given hypothesis is 
жн 8 ; for in this statistic, only the hypothetical value 
of the population mean, X, enters. 
recalled, is the maximum 
variance that is made fro: 


, The quantity 0°, it will be 
-likelihood estimate of the population 
m the sample; it is equal to the variance 
of the sample times 


For the problem illustrated, let 
the value of the sample variance be (180)? kilowatt-hours.? 


is not from some one specific population but from any one of a group of 
populations all of which have the same mean. 


1 A small sample is used here purposely (see p. 284). 
° This, it is recalled, is calcul 
_ 2(Xi -i 


в = N 


ated from the sample data by the formula 


P “ч 
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Coefficient of Risk. In order to answer the question that 
has been posed, the manufacturer must adopt a definite coeffi- 
cient of risk that will measure the chance he is willing to take 
of rejecting the hypothesis when it is actually true. As before, 
suppose he adopts a coefficient of risk equal to .05. 

Testing the Hypothesis with a Symmetrical Region of Rejection. 
Next the manufacturer must adopt a definite region of rejection 
With a probability of .05. Such a region will be attained if 
from among all the possible sample values of SEO › аз 
represented by the sampling distribution of this statistic, he 
Chooses a special subset such that the probability of à sample 
falling in this subset is just .05. Since the various possible 


-2.262 0 2260 ¢ 
Tia. 79. 
Sample values of this statistic are distributed in the form of a 
! distribution, with n — N — 1, the manufacturer will secure 
а .05 region of rejection if he includes in the region all values of 


VN(X-—35 
хас) that are numerically greater than 2.262.* "The 


value 2.262 is found by consulting a table of the ¢ distribution, in 
Which it is seen that for n = 9 (that is, 10 — 1) the probability 
of getting an absolute value of ¢ that is equal to or greater than 

2.262 is just .05. As shown in Fig. 79, this is а symmetrical 
region of rejection because the ¢ distribution is symmetrically 
distributed about its mean of zero. 

_ If the manufacturer adopts this symmetrical region of rejec- 
tion, he can test the given hypothesis as follows: This hypothesis 
is that X = 1,100. The sample mean X is 1,090, N = 10, and 
the sample standard deviation is 180. The last gives 


č = 180 4/32 = 189.7. 
* See pp. 110-111, 474. 
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N (X — ЖХ). 
For these values the sample value of VN ( ag is 


ГД 


М10 (1,090 — 1,100) _ — 167 
189.7 pu 


This 13 greater than —2.262 and less than +2.262, and the 
sample value does not therefore fall in the region of rejection. 
This is illustrated by Fig. 79. The sample mean, although 
less than 1,100,.is not sufficiently less than this quantity to cast 
doubt on its being the population mean. 

Other Regions of Rejection. The region of rejection adopted 
in the foregoing instance was a symmetrical region with respect 
to the t distribution. This would be the proper region to adopt 


-1.833 £ 
Fic. 80. 
if the manufacturer were indifferent to Whether the true mean 
value of the new process was above or below 
Since it is reasonable to Suppose that he w 
cerned if the true mean у 


the claimed value. 
ould be more con- 
alue were less than 1,100 than if it were 
, egion that contains sample values all 
at the lower end of the distribution is more appropriate for 
testing the given hypothesis. Such a region would be given by 
those values of ¢ that (for n = 9) are equal to or less than — 1.833. 
For the t table shows that the probability of a ¢ less than — 1.833 
isjust.05.* A picture of this Appropriate region and the location 
of the sample value with reference to it is shown in Fig. 80. 
In another case a region 


of rejection lying all at the upper 
end of the distribution might be the more appropriate. It is 


suggested that the reader himself think of a case in which this 
might be true and work out an illustrative problem. 
* The table gives the probability of a 


n absolute value of ¢ equal to or 
greater than [1.833] as .10; hence the probability of a ¢ equal to or less than 
— 1.833 is .05. t 
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Case Where Population Variance Is Known Compared with Case 
Where Population Variance Is Not Known. 'The difference 
between the case in which the standard deviation of the popula- 
tion is assumed to be known and the present case, in which it is 
not assumed to be known, may be made clear by several diagrams. 
These diagrams show how the various regions of rejection look in 


X» 
Fic. 81b. . 

terms of Fig. 64 (page 224). Consider first the symmetrical 
regions of rejection. In the case in which the standard deviation 
of the population is known, a hypothesis will be rejected if the 
sample falls in a region in which the mean of the sample deviates 
from the hypothetical population mean by more than a given 
amount, say 1.966, for a .05 symmetrical region. In terms of 
Fig. 64 this means that all samples falling outside the lines L and 
U;'shown in Fig. 81a, which is a miniature reproduction of Fig. 64 
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omitting the probabilities, will lead to the rejection of the 
hypothesis that the population mean is X. This is illustrated 
by the crosshatched portions of F ig. 81a. . 
In the case in which the Standard deviation of the population 
is not known the hypothesis is rejected if the sample falls in à 
region for which EE is numerically less than the 
-025 value of t. In terms of Fig. 64, this means that all samples 
lying between the lines а and 8, Fig. 810, which is also a miniature 
reproduction of Fig. 64 omitting the probabilities, will lead to the 


X; 


is the mean of the population. 


Д ched portion of Fig. 81b. 
Although the regions are symmetrical in both instances, it is 


to be noted that they do not include the same set of samples. 
Some samples are common to each, but there are other samples 
that are included in only one of the two regions, 

As shown by Fig. 816, samples falling in regions marked А 
would lead to the rejection of the hyp 
Samples falling in regions C w 


The reason why the latter sam 


ples lead to rejection of the 
hypothesis despite the fact that thi 


eir mean values are relatively 
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close to the population mean is that their standard deviations, 
and hence the estimates of the population standard deviation, 
are so small that the difference between the hypothetical popula- 
tion mean and the sample mean, small as it actually is, appears 
to be relatively large. Likewise, although the means of samples 
falling in regions C differ greatly from X, these samples do not 


X2 


Fic. 825. 


lead to rejection of the hypothesis when the standard deviation 
Of the population is not known, despite the great difference 
between their mean values and that of the population mean, 
simply because the standard deviations of these samples (and 
hence the estimates of the population standard deviation) 
make this difference seem relatively small. Similar remarks 
apply to the one-sided regions shown in Fig. 82a to c. 
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Confidence Limits for the Population Mean. Problem When 
the Population Variance Is Unknown. The determination of 
confidence limits for the population mean when the population 
variance is not known proceeds much the same as in the case 
when the variance is known. The essential difference is that the 
t distribution is used in place of the normal distribution. 

Suppose that a manufacturer finds that the mean length 
of life of 10 electric-light bulbs manufactured by some process 
is 1,090 kilowatt-hours and that the standard deviation of this 
Nothing is known about the 
ife of bulbs producible by the 


sample is 180 kilowatt-hours. 
standard deviation in length of 1 


That is, 
an of the popula- 
In setting up 
-95, say. 


Determination of Confi Ifthe manufacturer is 


indifferent as to w 
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may be found by determining the value of X that makes the 

m N(X-X 
probability of getting the sample value of VIE ез or a 
higher value just equal to .025. For if this procedure is always 
adopted in setting up symmetrical confidence limits, the limits 
so determined will include the population mean 95 times out of 
100 on the average.! Since the sampling distribution of 


VN (X — X) 
ГА 


is а ¢ distribution, with n in the / formula equal to N — 1, and 
since, forn = 10 — 1 = 9, the .025 points of the t distribution are 


2.262, these upper and lower limits for X can be found by 

setting VN E-Z) = +2.262 and solving for X. For the 
с 

given problem this yields the following, 


310 (1,090 — X) = +2.262 
[10 
180 y 
or X = 1,225.72 as the upper limit and 954.28 as the lower limit. 
The analysis is illustrated by Fig. 83. 1 
И the manufacturer wishes not to overestimate the mean 
of the population and does not care whether it is underestimated 
Or not, he will seek only an upper bound for his confidence inter- 
Val; in other words, he will seek an interval that runs from zero 
to this upper bound. For a confidence coefficient of .95, such 
àn upper bound may be found by determining the value of X 
W(X -Xx 
that makes the probability of the sample value of UE 
or a lower value just equal to .05. Since the lower .05 point 
of the t distribution (for n = 10 — 1 = 9) is — 1.833, the upper 
bound for X in the present instance is given by 


10 (1,090 — X) _ ggg 
[10 
180 y 
Which gives the value 1,199.98 for X. The whole confidence 
1 See pp. 174-179. 
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interval is thus 0 to 1,199.98. "The analysis is illustrated in 
Fig. 84. 


A=Locating 
upper bound 
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A= Locating 
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Scale unit 


-1.833 0 


ees 


| д 
1090 1199.98 Original units 
0 — B=Showing 149.98 Original units 
Confidence interval 


Fic. 84, 


It might happen in cases of this kind that the manufacturer 
desires not to underestimate the 


mean of the population. In 
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such an instance he may wish only to determine a lower bound 
for the population mean value, the upper “bound” presumably 
being infinity. For a confidence coefficient of .95, a lower bound 
of this kind can be found by determining the value of X that 
makes the probability of the sample value of SEU 


or а higher value just equal to 05. Since the upper .05 point of 


Scale units 


0 "8S gs VN(X-X) 
i с 


qa 2 т м 
980.02 1090 Original units 
| 


= 
980.02. 


oo 
Original 
units 

Fic. 85. 


10 —1=9) is +1.833, the lower 


the ¢ distribution (for n = 
is yielded by the equation 


bound for X in the given problem 


10 (1,090 — X) _ +1.833 


10 
180 4/9 


which gives X = 980.02. The whole confidence interval is thus 


980.02 to =. The analysis is illustrated in Fig. 85. 
Maximum-likelihood Estimate of the Population Mean. 


Problem When the Population Variance 15 Unknown. When the 


variance of the population is not known, the maximum-likelihood 


estimate of the population mean is based on the ¢ distribution 
instead of the normal distribution. The procedure, however, is 
exactly the same. The optimum estimate of X is the value of X 


that makes thé probability of the sample 


mum, Since the sampling distribution of this statistic is a 
„s occurs at zero, the given 


t-distribution the peak of which alway 
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sample ЗВ wil have the greatest probability of 


5 X-X , 
occurrence when X is such that = = 0, that is, 


when X — X. Asin the previous case, therefore, the optimum 
estimate of the population mean is the value of the sample mean. 

Use of Normal Curve with Large Samples. The foregoing 
procedure for testing hypotheses, finding confidence limits, and 
determining optimum estimates when the variance of the 
population is not known will give exact results whether the 
sample is large or small. If the sample is large, however, say 
30 or more, approximate results may be obtained by using the 
normal curve in place of the ¢ distribution. That is, for large 


VN ( 


samples, VN -S becomes an 2 that сап be looked up 


in a normal table. "The basis for the procedure is that when N 
is large the ¢ distribution is almost identical with the standard 
normal curve so that the latter can be used in its place. It is for 
this reason that the t table does not give values for n > 30. 


SAMPLING DISTRIBUTION OF THE VARIANCE 
Testing a Hypothesis. 


mean of the population. Consider 
ncerned with the variance. Suppose 
new process for the manufacture of 


ЕИ 8 that his process will reduce the variabil- 
ity in length of life of the bulbs, 


aim a manufacturer produces 10 bulbs 
deviation of this 


ў The question is: In view of the 
sample result, is the claim of а 


Population standard deviation 
ТА small sample is taken to Show that the analysis са 
as well as to large samples. For a simpl ў 


п be applied to small 
samples, see рр. 289-290. 


ег method applicable only to large 
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of 180 kilowatt-hours а reasonable one? Or, to put this in 
terms of variances, with a sample variance of 36,100 square 
kilowatt-hours is the claim of a population variance of 32,400 
Square kilowatt-hours a tenable hypothesis? 

Coefficient of Risk. Suppose, as in the case of the mean, that 
the manufacturer is willing to run the chance of rejecting hypoth- 
eses of this kind 5 times out of 100 when they are actually true. 
His coefficient of risk is thus .05. 

Regions of. Rejection. To assure а coefficient of risk of .05 
the manufacturer must select from the whole set of samples 
that might be drawn from the given hypothetical population 
a subset of samples the probability of which is just .05. If he 
rejects the hypothesis whenever the given sample is found 
to belong to this special subset, he will attain a coefficient of risk 
equal to .05. This subset will be his region of rejection. 

Since the manufacturer is interested here in the variance 
of the electric-light bulbs producible by the new process, the 
Set of all possible samples and the chosen region of rejec- 
tion can most appropriately be described in terms of the 
sample variance. When the set of all possible samples is so 
described, the result is the sampling distribution of the variance. 

his, as already noted, is a skewed distribution which centers 
roughly around the variance of the population—its mean is 


actually N times д.* If the unit of measurement is taken 


as 6?/N, the distribution assumes the form of a x? distribution, 
With п in the x? equation equal to N-17 

Since the manufacturer wishes the variance in bulbs to be 
as small as possible, he would like to reduce to а minimum the 
chance of accepting the inventor's claim when in actuality the 
Variance of the new process is above the hypothetical figure. 

he most appropriate region of „rejection for him to select is 
Consequently the region comprising the upper .05 tail of the 
Sampling distribution. A x? table shows that, for 


п= 10-1 = 9, 
this comprises all values of x? greater than 16.919, that is, all 


* For mean of No?/s? is N — 1, since mean х? із п. Seet. 
T See pp. 111-112, 263-263. 
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samples whose variances are greater than 16.9198°/10, or 1.691967. 
This is illustrated in Fig. 86. 

Testing the Hypothesis. The hypothesis to be tested in the 
given instance is that à? = 32,400. The region of rejection thus 
constitutes all samples whose variances exceed 


1.6919(32,400) — 54,818. 


The given sample variance is 36,100, which is far from this 
region of rejection. Hence the given hypothesis is not rejected; 
the claim of the inventor is not disproved. 

Other Examples. In testing hypotheses concerning the popula- 
tion variance, the statistician sometimes wishes to minimize 


0 16.99 y2 No? 
0 5488 г Xo? 
. N 4 
Fic. 86. 


the risk of acce 
is below the h 


In testing the hypothetical 
f examination, the statistician 


For N = 


occur in which the statistician is 


eing tested. In these cases, the most 


appropriate region of rejection would be one in which the total 
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probability of .05 is equally distributed at both ends of the dis- 
tribution. Unfortunately, the x? table has not been constructed 
so that the points marking the .025 tails of the distribution can 
be obtained without interpolation. If the coefficient of risk 
were set at .04, however, tails of .02 each could readily be deter- 
mined from the existing table. Thus, if n = 9, a .04 region of 


Scale 
units 


2 
0 3325 EX 


= II LÁ — aa 
0 10773 а?- xa? 
N 
Fic. 87. 


rejection could be taken to constitute all values of x? less than 
2.532 and all values of x? greater than 19.697. This region of 
rejeetion would comprise all samples whose variances are less 
than 2.53262/10 and greater than 19.69762/10. Such a region of 
rejection would afford an even balance between the risk of accept- 
Ing the hypothesis when the true variance is less than the hypo- 


691 Na? 
0 2532 107 yz Ма 
ج‎ gg و‎ 
65818 2. x20? 
0 8204 EX. 
Fic. 88. 
| accepting it when the true variance 
is greater than this amount. This is illustrated in Fig. 88. 
Confidence Limits for the Population Variance. The Problem. 
In many cases по particular hypothesis regarding the population 
Variance is to be tested. Instead, it is merely desired to deter- 
mine a range of values within which, on the basis of a given 


thetical figure and the risk of 
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sample, the variance of the population may reasonably be 
expected to lie. It is with the determination of such confidence 
limits that the present section is concerned. | 
To make the problem concrete suppose that 10 electric-light 
bulbs are manufactured by the new process referred to in the 
foregoing example and that the standard deviation in length of 
life of this sample is found to be 190 kilowatt-hours. This 
means a variance of 36,100 square kilowatt-hours. Suppose 
further that the manufacturer who is testing this new process 
wants to run no greater chance than 4 out of 100 that the con- 
fidence limits set up will fail to cover the true value of the 


variance. That is, he wishes confidence limits whose confidence 
coefficient is .96,* 


Determination of Confidence Limits. 


interested in both an upper and a lower bound for the population 
variance, he can set up confidence limits as follows: He can 
obtain an upper limit for 62 by finding the value that makes the 
quantity № 2/4? just equal to the lower .02 point of the x” 
distribution. For if this value is the true value, then the prob- 
ability of the sample о? or some lower value will be just .02. 
Similarly, a lower bound for 6? can be obtained by finding the 


value of 6? that makes the quantity Ng2/g2 just equal to the upper 
02 point of the x? distribution. For if this value is the true 
value, then the probability of setting the sample or some higher 

- The values of в? 


value will be just .02 


If the manufacturer i$ 


For the given data these Upper and lower limits for 6° are 
determined as follows: For Жез ол 9, the lower .02 
point of the x? distribution is 2.532. The upper limit for à? 
is thus given by No?/6? = 2.532. This yields 


10) (36,100 
(106,100) = 2.532, 
or 6? = 142,575. Similarly, for n = 9, the upper .02 point 
of the x? distribution is 19.697 and the lower limit for в? is given 

* The confidence coefficient is put at this figure instead of the 
because the x? table gives the .02 points at each end instead of thie 
points, 


usual .95 
usual .025 
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by No?/6? = 19.697, or (10) (36,100)/8° = 19.697. This yields 
? = 18,328. The confidence interval is thus 18,328-142,575, 
and it may be said that there is à probability of .96 that this 
interval covers the true variance. 

'The foregoing interval was an unbiased confidence interval 
in that the probability of failing to cover the population value 
because the interval was set too low was equal to the probability 
of failing to cover the population value because the interval 
was set too high. If the manufacturer had wished to set up & 
confidence interval that had only an upper bound, he could have 
found that upper bound by setting No?/8? equal to the .95 point 
of the x? distribution for n = N — 1. For the given data this 
yields 10(36,100)/6? = 3.325, or 62 = 108,571. Hence the 
manufacturer might say that the range 0-108,571 has a probabil- 
ity of .95 of covering the true value. That is, in cases of this 
kind the manufacturer would go wrong only 5 per cent of the time 
in assuming that the population variance was equal to or below 
the upper bound. It will be noted here that the confidence 
interval established in this instance had а confidence coefficient 
of .95 instead of .96 as in the previous example. The former 
Was adopted because the х? table lacks a .96 point. 

In conclusion, it should be noted once again that, the larger 
the sample, the narrower the confidence interval. If a sample 
of 20 had been taken instead of a sample of 10, the limits for 
8° would have been 21,433 and 84,277. Thus, the larger the 
Sample, the greater the assurance that the sample variance is 
Near the true variance. It must be remembered, however, that 
a larger sample may involve increased expense. 


SPECIAL METHOD FOR LARGER SAMPLES 
As indieated in the preceding chapter, when the sample is 
large the sampling distribution of the variance is approximately 
normal, its mean being practically the mean of the population 


more exactly, 2 and its standard deviation being 


N 
№ —1 
2 
8? We Hence for large samples, say samples greater than 30, 
any hypothetical value of à? may be tested by noting the value 
o? — t 


SEN If а symmetrical region of rejection is adopted 
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2 E t- 
with a probability of -05, then all values of 2 7N lying ou 
a rejection of the hypothesis. If b 

at one end, the hypothesis will 


side of +1.96 will lead to 
region of rejection is put 
g? 52 8 


i 
is greater than 1.645 in one instance or i 
в: VIN greater nan т on 


it is less than — 1.645 in another instance. Similarly, confidence 


limits with a coefficient of .95 may be determined by setting 
a 62 


ү 21961 rer is desired, 
ê 4/2/N if both an upper and lower bound is c 


or to 1.645 if a lower bound onl 
upper bound only is desired, 


To illustrate this special method for large samples, suppose 
that the batch of electric-light bulbs of the previous sample had 


numbered 98 instead of 10. Let the Sample variance be 36,100 
аз before, and consider the hypothesis that the true variance is 
32,400 (that is, в = 180). the region of rejection is taken 
as the upper .05 tail of the normal curve, then this hypothesis 


rejected if 


Y is desired, or to —1.645 if an 


may be tested by finding the value of 92 — 6 For the given 


36,100 ү УХ 
i83 ] — 32,400 + 
data this is equal to (32400 - ~ = .8. Since this is less than 
1.645, the sample does not fall in the region of rejection and the 
hypothesis is not rejected, 


i Upper bound of the 
[mined so that if the population 
this upper bound then the probabil- 
lower value is just .02, and let the 
in the same manner. Then these 
upper and lower limits will be given by ен = +2.054. 
This yields à? = 27,911 and 6? = 51,090 as the limite of the 
confidence interval. It will be noted that this interval is still 
smaller than that for which N — 10 or N — 99 
Maximum-likelihood Estimate 


bo: Population Variance. 
So far maximum-likelihood estima. 


tes of Population parameters 
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have been found to be the sample values of the corresponding 
statistics. Thus’ the maximum-likelihood estimate of the per- 
centage of “favorable” cases in the population is the percentage 
of "favorable" cases in the sample. Similarly, whether or not 
the variance of the population is known, the maximum-likelihood 
estimate of the mean of the population is the mean of the sample. 
In the present instance, however, the maximum-likelihood esti- 
mate of the population variance that can be made independently 
of the sample mean! is not the sample variance but the sample 


The derivation of this result 


у. H N 
var ipli б 
СВ multiplied by N-1 
1s as follows: 

By definition the maximum-likelihood estimate of a population 


parameter is the value of the parameter that will make the 


o*=/80 


0 00 „ 200 300 
с 
ла. 89a. 
Probability of the given sample а maximum. For an assigned 
value of the population variance the probability of a sample 
aving any given variance may be determined from the sampling 
distribution of the variance. The equation for this distribution 
ìs Eq. (3) of Chap. X. На particular value is given to 6° in 
this equation, it will yield the probability of getting samples with 
various values of o. If, however, the value of a given sample 
Variance is substituted in this equation, then the formula yields 
the probability of getting this particular sample for various 
Possible values of 8°. Figure 89a, for example, shows the 
probability of a sample variance of 100 when the population 
variance is 180, the size of the sample being 11. Figure 89b 
shows (for N = 11) the probability of a sample variance of 100 
when the population variance is 80, and Fig. 89¢ shows this 
! When the mean and standard deviation are estimated jointly then the 


naximum-likelihood estimate of the population variance 15 also the variance 
of the sample. 
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No? 
iw — Ty 
that is, to 110. Further calculations will show that the probabil- 
ity of the given o? is greater when 6? has this last value than 
when it has any other value." This is indicated by the curve 


shown in Fig. 90. · The optimum estimate of 6? when c? = 100 
and N — 11 is thus 110. 


probability when the population variance is equal to 


0-80 
1 
0 100 2 200 300 
с 
Fic. 895. 


2 $ 
=;5100)}=110 


ОГ а 300 


Fic. 89c, 
. The algebraic counterpart of this geometrical explanation 
is as follows: For а given o, Eq. (3) of Chap. X gives the prob- 
ability of that c? as а function of g?, If this probability is to be 
a maximum for 6?, then the derivative of the function with 
respect to 6? must be zero at this Maximizing value. To simplify 


lying between the given с? 
of a sample variance of 100" is merely 


ofa sample variance lying between 100 and 100 + дт” Although the true 
probability is an area, i inate that changes from case to case 
in this argument and is needs to be considered here. 


and the given о? 


therefore all that 
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matters the logarithm of the probability is differentiated instead 
of the probability itself, but this does not alter the result since 
the probability wil be a maximum when its logarithm ва 
maximum. Thus, from Eq. (3), Chap. X, 


№" N-1 
2 


log [Р(02)] = — 5 — log 6? + terms not involving 8. 


Differentiating this with respect to 4? and setting the result 
equal to 0 yields the following: 


d log [P(e?)] _ Nee _ ¥= pet 0 


dé: og! 2e 


or 
(gU (a) 


It may seem strange at first thought that the optimum value 
of 6? is not the value that makes the given 0? fall at the mode 


Likelihood of а? 


c? 
ш Fic. 90. 


Or peak of the sampling distribution. On more careful examina- 


lon, however, it is seen that the sampling distribution of c? 
changes not only its position but also its shape with changes 
in the value assigned to 6%. The height at the mode of one 
curve may thus be less than the height at some other point on 
another curve. In the present instance it turns out that the 


height of the curve given by 8? = wi” at the point c? is 


greater than the maximum height of the curve whose mode 


occurs at o. 
The explanation of this some 
maximum-likelihood estimate © 


what unexpected result for the 
f ê? lies in the fact that the 
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variance of а sample is always measured from the mean of 
the sample and not the mean of the population. Since the mean 
itself varies from sample to sample, any estimate of the popula- 
tion variance that ignores this variation in the mean will tend 
to underestimate its true value. It will be seen in Chap. XIV 
that, when the mean of the population and the variance of the 
population are estimated jointly, the Joint maximum-likelihood 
estimates are X = X and č? = g2, 


SAMPLING DISTRIBUTION OF THE RANGE 


Table XIII of the Appendix! giving pertinent facts regarding 
the sampling distribution of the range may be used to estimate 
the population standard deviation in а quick and ready manner; 
for the values given in the table are for w = Sof 
in samples of varying size are given. 
given sample range is close to the mean 
range for samples of the g dio — Fs 


; and also 
the mean values for w 
On the assumption that a, 


iven size, it follows that 


will give a rough estimate of 9. Suppose, for example, that the 
range of a sample of 10 cases is 50, then 50/3.078 or 


50(.325) = 16.25 


"Table XIII shows that, i 
6, the mean range for samples of 8 is 2.8478, It also shows 
that the standard error Hence the stand- 
1 See р. 482. Note X, = 


largest, X, = smallest case, 
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ard error of a mean of 10 sample ranges would be .820/4/10. 
In the present instance, therefore, the .95 confidence interval 


for ê is given by 1 as 28418) _. +1.96 (the latter 


being the upper and lower .025 probability points of the normal 
curve). The confidence interval for 6 is accordingly 6.8 to 9.7. 

The sampling distribution of the range may also be used 
for quick analysis in certain problems in which the variation 
in means of particular groups is being tested for significance.’ 
For example, suppose that five classes of 12 students each 
lave avaga grades of 77.25; 7783, 10.02, 09.92, 00 ТЫШ 
and the ranges of individual grades for the five classes are 
31, 34, 24, 44, and 44 points. On the basis of the variation 
of grades indicated by the ranges, is the variation in mean 
grades more than might reasonably be attributed to chance? 
This is the question that may readily be answered as follows: 

The mean of the five ranges is a Mean range of 35.4 points. 
From Table XIII of the Appendix it is seen that for groups of 
12 the mean range is 8.2586. Hence 35.4/3.258 = 10.86 may 
Serve as a rough estimate of the population standard deviation 6. 
Now, if в = 10.86, then means of 12 grades should have a 
$ = 10.86/4/12 = 3.135 and Table XIII of the Appendix 
Shows that for groups of five the mean range for a variable whose 
ê is 3.135 is 2.3266 = 2.326 X 3.135 = 1.292. For the five 
mean grades given above the range is 7.91 and is thus just about 
What would be expected on the basis of chance. Hence the 
Analysis of the variation in mean grades based on the ranges of 
grades in the individual classes indicates that the variation in 
Means is apparently no greater than may reasonably be attributed 
to chance. This is identical with the conclusion reached in 
Chap. XVII after a more refined analysis of variance. The 
rougher analysis presented here is especially valuable as a 
preliminary analysis for throwing out those cases in which the 
Variation in means is shown by the rougher analysis to be clearly 
not due to chance. 

The various percentage points of the sampling distribution 
of the range may also be useful in setting up charts to control 
‚ See Surrn, J. G., and A. J. DUNCAN, Elementary Statistics and Applica- 
tions, Chap. ХУ, for "discussion of correlation ratio. Also, see Chap. XVIII 
elow on Analysis of Variance. 
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the quality of industrial mass production. An interesting dis- 
cussion of the use of the range in industrial quality control may be 
found in reports issued by the British Standards Institution.! 

Use of Sampling Distributions of Вт, В (or gı, go) and a to Test 
for Departure from Normality. As indicated in the previous 
chapter, the sampling distributions of В: and 8» (or gı and gs if the 
worker prefers these more refined statistics) and of a = A.D./e 
may be used to test departure from normality. For illustrative 
purposes consider the data on the weights of 300 Princeton 
freshmen? discussed in Chap. VII. 

For these data, 


VEL = 687, вы 4.6720, 


and a = .7640. Tests of 
carried out as follows. 
For samples of 300, 


gı = .635, gə = 1.7320 


departure from normality may be 


and the ratios of 4/ B: and $: — 3 to their standard errors are 


YE: _ 87 PM 
бу BISS 
МВ: — 3 4.0720 — 3 

dg, 389 = 5.9 


Both these indicate a marked departure from ‘normality in that 

the probability of either 4/8; or 

errors by as much as 4.5 to 5.9 times is ver Ч les 

weis fa 800 cd 23 15 Very small. For samp 

MB and 8; — 3 and their Standard errors. In‏ ا 
case in hand, 8, = 41, 6,, = 281, 01/8, = = 4.5‏ € 

and 92/6). = 1.7320/281 < ¢ сөр Leda 


6.2; therefore, the conclusions are 


? See pp. 138, 141, 
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S 
not modified by the use of the more = снаа 
of gı and д. Table XI of the Appendix + : "eS the upper 
sample value of В» = 4.6720 is considerably a shows? that 
1 per cent point, and Table XII of the Appen à eu ec 
the sample value of a — .7640 is below the lowe Fa arture from 
for а. Hence all the tests indicate a distinct dep 
normality. 


1 See p. 480. 
? See p. 481. 


CHAPTER XII 


SAMPLING FLUCTUATIONS 
IN CORRELATION STATISTICS 


SAMPLING DISTRIBUTION OF THE CORRELATION COEFFICIENT r 


Correlation coefficients have sampling distributions that E 
different from any yet discussed. Thus, if a large number 
samples are taken at random from a normal bivariate Ropu oP 
and if a frequency distribution is made of the sample values 0 


71», the shape of this distribution will be found to conform to 
the equation? 


И) йы @) 
м1 — пу 
rs to the population correlation 
N—2 20871 (р 
coefficient, and d des (тта) 


- = ) means the N — 2 
(dry2r12)%—? М1 — ort ) 


d(risrig) N72 


where ris, in boldface type, refe 


1 А sample from a monovariat, 
group of individual values; a 
а group of pairs of values, 
population consisting of whit 


€ or univariate population consists of 2 
sample from a bivariate population consists of 
For example, a sample from the univariate 
е male heights might be represented by the 
measurements 68, 70, 69, 74, 70, 71, 72, 71, 68, 69 inches, each of which 
represents the height of a particular individual. A sample from the bivari- 


ate population consisting of white male heights and weights might be герге- 
sented by pairs of measurements: 


HABE EN NEM SLT 
140 150 149 206 165 183 190 175 154 142 pounds 


each pair consisting of the height and weight of a particular individual. 
It is to be noted that each of the above samples is said to be a sample of 10, 
although the second contains 20 measurements. They are each a sample 
of 10 in the sense that measurements are made only of 10 individuals. 


* Cf. FISHER, В. A., “On the ‘Probable Error’ of a Coefficient of Correla- 
tion Deduced from a Small Sample,” Metron, Vol, 1 (1920-1921), Part 4, рр. 
1-82. 
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derivative of the expression in parentheses with respect to T1212- 
It is to be especially noted that the shape of this distribution 
depends not only on N, the size of the samples, but also on the 
population coefficient of correlation, F12. For small values of riz 
the distribution is practically symmetrical; for large absolute 
values of гуз, however, the distribution is very skewed. This 
is illustrated in Fig. 91. The character of the sampling dis- 
tribution of r is in part related to the fact that ri» and 712 must 
always be between —1 and +1. Hence if rus is close to +1, 
say .94, then the range of values above ri that might possibly be 
taken by the sample ri» is very small compared with the range 


+1 лг 
= —0.6, ris = 0, and ri» = 0.6. 


m2 “| 0 

Fro, 91.—Sampling distributions of 712, for ri 
of values that might be taken by ris below Га», and vice versa, 
if түз is close to —1. Even for large values of № the distribution 
of ту remains very skewed when газ is close to +1 or —1. 

‚ Fortunately it is possible to “transform” the sampling 

istribution of rz into a more useful form. Thus, R. A. Fisher 
has shown that if 212 is defined as 21108: (1 + та) — log. (1 — ты) 
that is, as tanh-! тз (read “inverse hyperbolic tangent of r1"), 
then for all but very small values of N the sampling distribution 
Of zis is practically normal in form, with a mean approximately 


equal to the population value, 212 that is, to 


1106, (1 + T12) — loge a = nh 


and a standard deviation equal approximately to 1/ VN — 3.* 


1% $ 
ios гуз = 0 it is perfectly symmetrical. nd 
The distribution of z may be obtained by substituting 


in Eq. (1). The mean of the distribution of 2 is always, except when ri = 0, 
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Accordingly, if this “2 transformation" is used, a sampling 
distribution can be obtained that for all but very small samples 
15 practically independent of the population correlation coefficient 
and is moreover approximately normal in form. The 2 trans- 
formation thus makes it unnecessary to use elaborate tables of 
the distribution of r, in making estimates and testing hypotheses 
regarding the population correlation coefficient. 

Again it is to be noted that all the foregoing depends on the 
assumption that the population of X, and X. 2 values is distributed 
in the form of a joint normal frequency distribution. 


practicable procedure 
€ 2 transformation. For, as pointed out 
above, the sampling distribution of z is practically normal in 
form, and tables of the 

able. Since z is normall 


regarding any normally distr 


was found to be rj = 89576. A table of hyperbolic tangents 
(see Appendix, Table XIV) shows that the angle for which 


sai 

value than the population value of zı», the “bias” 

being given roughly by EN 
А more aceurate formula that ind: 


icates the closeness of the approximation 
for the standard deviati 


ìon is as follows: 

А 1 

A 8+ (N — 3)г?, 
—— — 9i 


MR 2N +2 — г, 


1 For nonnormal cases see discussion, р. 451, 
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.89576 is the hyperbolic tangent (it will be recalled that 
Tig = tanh 212, 


for zi; = tanh-! ra) is approximately 1.4506. If tables are not 
available, z may be calculated from the equation 


Blog. (1 + т) — loge (1 — гы) 
[ogi (1 + ^з) — logi (1 — 7:2] 


212 
USE UN 

2(.43429) 

For the given case this yields z — 1.45049, which checks to three 


places with the value derived from the table." Here the sample 
value of zı, will be taken as that given by the table, viz., 


22 = 1.4500. 


Suppose it is argued that the true correlation coefficient 
between first- and second-semester English grades is .9900. How 
does this hypothesis fare with reference to the sample coefficient 
of .89576? То test this hypothesis convert both these correla- 
tion coefficients to their corresponding z values. The table of 
hyperbolic tangents indicates that z = tanh” .0900 = 2.6465 
and, as previously, 2 = tanh"! 89576 = 1.4506. Hence the 
hypothetical z is 2.6465, and the sample z is 1.4506. The stand- 


1 1 А 
ard а УЖ = .11323. Sincez 
deviation of 2 = JN 3 813 


is normally distributed, symmetrical regions of rejection would 
be given by the hypothetical z plus or minus 1.966., or in this 
Particular case by 2.6465 + .22193, or 2.8084 and 2.1246. Since 
the sample value falls below 2.4246, the hypothesis would not be 
accepted. 

Confidence Interval for гіз. In order to determine a confidence 
interval for ri» from the given sample, it is necessary merely to 
determine a confidence interval for the population z;» and convert 
this into a value for г. Thus, suppose that symmetrical confi- 
dence limits are desired (symmetrical in the sense that failure to 
Cover the true value is equally probable at one end and at the 
other). For 212 these are given by the sample 212 + 1.966. In 
the present instance, 


1 The difference is due to the difference in decimals carried, methods of 
interpolation, etc., and not to any difference in definition. 
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„ = ЙЕ” JE == .11323, and 
1.966. = .22193. 

Accordingly, the upper confidence limit for 212 is 

1.4506 + .22193 — 1.6725, 


and the lower confidence limit for 212 ls 


1.4506 — .22193 = 1.2287. 
Interpolating in a table of hy 
hyperbolic tangent of 1.6725 is 


of 1.2287 is 84220, These val 
inverse of the e 


perbolic tangents shows that the 
-93188 and the hyperbolic tangent 
ues may also be obtained from the 
quation for the z transformation given above, viz., 


MES e? — 1 
= езт | 


Thus for the upper limit, 22, equals 3.3450; logy) ¢3:3450 equals 
3.3450(.43429) = 1.45270; and e3.3450 equals the antilogarithm 
of this, or 28.359. Consequently, the upper limit for ri» equals 
27.350/29.359 — :93188, which checks the interpolation in the 
ble Similar calculations Would check the .84220 lower limit 
OF fis. 


In Summary, the Confidence 
93188; it may thus be concluded that the range .84290-.93188 
Covers the true value of гу, with а probability of 95. То put it 
another way, any hypothetical value of rj» that is below .84220 
01 above .93188 is d unreasonable value in view of the 
sample value 712 = .89576, 


Confidence Jim; 
be tested in such а way that the г 
one end. АП this would } 
Chap. XI, however, e repeated here, Аз men- 
tioned above 8 been transformed into "" all the 
applied. Y distributed variables can be 


on correlation Coefficient that is 
© Means and variances of 
om the sampling distribution of 
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LR independent maximum likelihood estimate of riz is 
z ined by so choosing гі» that the probability of the sample 
1» as indicated by Eq. (1), is a maximum.! This yields the 
Approximate result? 


йы = E тз(1 — ri) 
ће = ta = XN) (2) 


That i : 
GI 15, the sample ri; is somewhat tooghigh as an estimate 
he population correlation coefficient and needs to be corrected 


т(1 — тїз). 


b е Р " 
Y the subtraction of the quantity (N — 1) For example, 


if = 

"ıe = .80576 and N = 81, then the maximum-likelihood 

estimate of ry» is 89576 — (юнион у = were = 89465. 

м ris is estimated jointly with the means and variances 
he population, it is found that the maximum-likelihood esti- 


m à 
ate of ri» is the sample r12. 


SAMPLING DISTRIBUTION 
OF OTHER CORRELATION COEFFICIENTS 


"beni Correlation Coefficients. The analysis that has been 
"9 e in the preceding section for a simple correlation 
EL can be applied with very little modification to partial 
а coefficients. The only change required in the 
constant, “т subtract from М the number of variables held 
Space m s before, the practicable procedure is to take 
The E: Ti... and to treat zasif it were normally distributed. 
lus ^is of the distribution of 2 is again approximately the 
ти 012 corresponding to the population correlation coefficient, 
tan + that is, 2 = tanh” rj4...; but the standard deviation 
nad ow the reciprocal of the square root of N — m — 3, where 
e number of variables held constant. 


1 
RS рр. 181-182, 
"ES is is obtained by taking the derivati 
кез a setting the result, equal to zero. 
Tables imation. Better approximations may 
ты or Statisticians and Biometricians, Vol. II, p. 253. 
equation that is commonly used is 
r(N =1) =1 


й = TSO 


E N-2 


ve of expression (1) with respect 
The formula is only a first: 
be found in Karl Pearson’s 
Another approxi- 
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To illustrate, consider the determination of ae 
for the ries of the Mount Holyoke data. For these i Mes 
found that r12.34 was .80440. For this value of т, the value 
is 1.1110. The standard deviation of z in this 


T 
V8i —-2—3 


instance is 


Accordingly, symmetrical .95 confidence limits for z are 


1.1110 + 1.96(0.1147) 


and 1.1110 — 1.96(0.1147), or 1.3358 and 0.8862. The ris: 


а 6 
limits Corresponding to these two z limits are, respectively, 8701 
and .7095. 


lation Coefficients. When the population 
n coefficient is Zero, the statistic 
R*/(k — 1) 
(1 = DIN = k) 


multiple correlatio; 


lon of the form of the F ни 
® —1 and m= N — k. Here R is the pase 
correlation coefficient k is the number of regression epe 
regression equation, and N is th " 
Hence the Р distribution ean be used to tes 


Utiple correlation Coefficient is significantly 


-3...› 613.2...) ete., in the 
Size of the sample, 


Whether а given mı 
different from zero, 
For the Mount Holyoke data, f ‚ 
Rias = 2529, and the question arises whether this is 8167 
nificantly different from zero, To answer this the hypothesis 
that the Population value is Zero (the “null hypothesis” as it 


- of 
ollows: The pn 
» and the number of regressio 
more than this, or four, Hence, k = 4 and 


t 
ог example, it was found tha 


R? 2529 
k=1 = “== = 0843, 
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Since N = l1—R* WR. m 
81, к= = 77 .00970. Therefore, 


Ri(s—1) _ 0843 _ 
О ЮЛ E) "0097 - 99? 


The F table (see Appendix, page 476) shows that, for m = 3 
and n, = 60, the .05 point is 2.758 and, for nı = 3 and ns = c, 
the .05 point is 2.605. Accordingly, the .05 point for m = 3 
and n; = 77 lies between 2.758 and 2.605. Since the sample 
Value is much greater than this, the null hypothesis is obviously 
rejected and the multiple correlation coefficient must be declared 
Significantly different from zero. That is, there must be some 
multiple correlation between the variables concerned. 

_ When the population correlation coefficient is not zero, the 
distribution of the multiple correlation coefficient is more 
complicated. ‘This was worked out by R. A. Fisher in 1928.1 
Using Fisher’s analysis, Mordecai Ezekiel has drawn a series of 
charts that give the lower confidence limits for the population 
Multiple correlation coefficient for samples of varying size and 
Varying numbers of independent variables. The confidence 
Coefficient used was .95. These charts are to be found in the 
Appendix to Ezekiel’s book on Methods of Correlation Analysis. 
For N = 75 and k = 4, for example, Ezekiel's Chart C shows 
that, if the sample multiple correlation is .57, then the lower 
Confidence limit for the population value is 40. For the Mount 

Slyoke data the various multiple correlation coefficients are 
Rasa = £9056, Юли = .8983, Rain = .6326, and Ras = .5029. 
For each of these, N = 81, and k = 4. Ezekiel's Chart C 
Shows that the lower confidence limit for these are approximately 
85, -85, .49, and .32, respectively. In other words, the chance 
18 .95 that the ranges .85-1, .85-1, 49-1, and .32-1 cover the 
Population value in each case. . 

The maximum-likelihood estimate of the multiple cor: 
Coefficient is given by the approximate formula 


relation 


ЗЕ Tyre 
г RND- ED 
Series A (1928), Vol. 121, pp. 


1 г Р 
бү е of the Royal Society of London, 
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Thus the maximum-likelihood estimate of Rios, is 


\ 
.9056(80) — 3 _ 
TT = 9019 


Correlation Ratio. If the correlation ratio for the population 
1 DU 7 I D also has a sampling 
distribution of the form of an F distribution with n, = k 1 
and n; = N — К. Here k stands for the number of means 
entering into the calculation of 712. 

To illustrate the use of the sampling distribution of Nig con- 


sider again the Mount Holyoke data. For these, n, = .82124, 
k = 11, = 81, and thus 


82124/10 — 082124 
-17873/70 — 902553 = 32.108 


15 zero, then the statistic 


To test the hypothesis that the true correlation ratio is 0, take 
& coefficient of risk equal to .05, and take the upper .05 tail as 
the region of rejection. For this case, m1 = 11 — 1 = 10, and 
T nı = 8 and n; = 60, the F table shows 
is 2.097; and, for nı = 8 and fig = c, 
the upper .05 pointis1.938. For ^; = 12and ng = 60, the upper 
:05 point is 1.918; and, for n, — 12 and n; = c, the .05 point 
i — 70 is unnecessary 
r beyond the upper .05 point for 
ratio is certainly significantly 
Correlation Index. 


If a curve such as a parabola is fitted to 
the original data and t 


he correlation index Tor the population is 
› then the statistic T= By oy is likewise distributed 
— k, k being the number of 
оп for the curve, When 
make a distribution linear, 
hms or reciprocals can be 
Test for Linearity. Since a correlation ratio and a correlation. 
à correlation coefficient calculated 
9n may be asked: How can it be 
On is really linear and when non- 
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linear? This question can be answered by a sampling test. 
Thus the null hypothesis is set up that the population is linear 
and this hypothesis tested in the light of the difference between 
the correlation coefficient and the correlation ratio (or the corre- 
lation index) for the given set af sample data. 

If the population is a normal bivariate distribution in which 
the progression of means is actually linear, then the statistic 


(n? — 72)/ (Kk a 2) . i (1° — ry/(k — 2) 
(1 — 39/(N = Б) or the statistic 7970 0) 


is distributed like F with n; = k — 2 and nə = N — k, where k is 
the number of means from which the correlation ratio is computed 
or in the ease of J the number of regression statisties in the equa- 
tion of the curve. 

For the Mount Holyoke example a correlation index was not 
computed; but the correlation ratio was calculated, and accord- 


ing to the above procedure a test for linearity can be made as 
follows: 


T% = 80239 т, = 82123 k-11 N-8 
m= 9 пә = 70 


Hence, 
82123 — .80239 
9 _ 00209 _ 
17877 duas = 198 
70 


The .05 point in the F distribution for nı = 9 and ns = 70 is 
between 1.752 and 2.097. Thus .8196 is well within that limit, 
and correlation is presumably linear. 


PART III 


Advanced Sampling Problems 


CHAPTER XIII 
SAMPLING FROM A DISCRETE MANIFOLD POPULATION 


Chapter IX. was: concerned with sampling from a discrete 
twofold population. It is the purpose of the present chapter to 
extend the discussion of this earlier chapter to a discrete mani- 
fold population, a population that is divided into more than 
two classes. The theory of sampling from a discrete manifold 
population has widespread application, for in many discrete 
populations the cases are grouped into several classes. Further- 
more, the cases of any continuously distributed population can 
be, and commonly are, arbitrarily grouped into a finite number 
of classes, 
и The theoretical argument pertaining to manifold populations 
1s fundamentally an extension of the argument pertaining to 2 
twofold population. The ensuing analysis will accordingly 
pursue the same line of attack as that of the earlier chapter- 
The argument will be presented in full; but those parts that are 
identical with the earlier argument will be reviewed only briefly, 
and attention will primarily be centered on the complications 


that arise from a manifold instead of a twofold division of the 
population. 


: i The fundamental assumptions 
n the simpler case, Tt is first assumed that 


he Population from which it i$ 
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Although the sample is assumed to be small relative to the 
population, it is nevertheless assumed to be sufficiently large 
to permit the use of certain approximations. In some of the 
theoretical illustrations very small samples are used for е 
sake of simplicity, but in the discussion of the practical applica- 
tion of the argument it is assumed that the samples are large in 
the aggregate. As a general rule, it is assumed that the product 
of the population percentage of any class times the size of the 
sample (that is, р, №) is at least equal to 5. 

A second fundamental assumption is that the method of 
sampling is a random one. This implies once again that the 
results of repeated sampling by the selected method can be 
Predicted by the calculation of probabilities for some mathe- 
matical model. The appropriate model will be described in the 
next section. 

In order to make the analysis concrete, suppose the problem 
is that of estimating the percentages of various types of religious 
adherents in a given locality. The alternatives are taken 
to be Catholic, Protestant, and other religious denominations, 
including atheists, and the problem will be to estimate by means 
of à random sample the percentages of these various religious 
adherents in the given population. 

he Sampling Distribution of Percentages. In making infer- 
ences about the percentages of various attributes in a manifold 
Population, the sample statistics that naturally offer themselves 
for this purpose are the sample percentages of these attributes. 
ubsequently, another statistic will be described that is more 
commonly used in large samples; but for the moment attention 
Will be centered upon the sample percentages. To make use 
of these percentages in testing hypotheses, ete., the sampling 
distribution of the percentages must be derived. This will be 
the Principal task of the present section. 
ertvation. The analysis of Chap. IX suggests that the 
Sampling distribution of percentages can most conveniently be 
derived from the following mathematieal model: Suppose 
10 packs of cards in each of which there are pi per cent red, 
P2 Per cent white, and p; per cent blue cards. Let all possible 
combinations of N cards each be made by combining without 
restriction each card in any given pack with a single card from 
each of the other packs. Since the probability of a red card in 
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each group is ру, the probability of a white card is p», and the 
probability of a blue card is рз; and since combinations are formed 
without restriction so that the selection of any card from any one 
pack does not affect the probabilities in other packs, then by the 
multiplication theorem the probability of a combination having, 
for example, the first three cards red, the second five white, and 
the last two blue, is pipipj. : 

'The probability just indicated is true for any combination, 
however, in which there are three red, five White, and two blue 
cards. "The total number of such combinations is given by the 

, combinatorial formula! 


N! 


N a 
Менен МММ 


Which, for the above example, equals 


10! 
315191 = 2,520 
Accordingly, the total probability of such 


10! 
31512; PiPip$ 


а combination would be 


For N packs of cards and for combinations containing N cards, 
the probability of a combination containing N, red, № white, 
and №; blue cards would be 


N! M 
ИМ PPS Py? 


where N, + М, + М, = М. 


The last probability may be taken as а good prediction of 
the relative frequeney with which samples of N would have №: 
Catholics, N, Protestants, and Ns other denominations in the 
whole set of samples of size N that might, by repeated sampling 
(with replacements?), be drawn at random from the given popula- 
tion. This is true because, if the Process of sampling is random, 


, each case would have to be “put back" so 
on the next draw, This need only be true, 
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it is reasonable to suppose that every particular combination of 
persons will appear as frequently as every other combination 
80 that the relative frequency of samples of N having №, Cath- 
olies, №, Protestants, and N; other denominations will tend to be 
the same as the relative frequency of combinations of №; red, № 
White, and N; blue cards among the set of all possible combina- 
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Fig, 92.—The multinomial distribution represented by the equation 


d 


NNN 


5! 


ШР ҮП ҮПҮП 


) (5) 


(Probabilit; 
babilities are expressed in terms of 51, ds.) 


tions of N cards 
each 


Prae of N different packs. 


tion of percentages from 


that might be made by selecting а card from 


The formula for the sampling dis- 
а threefold population is thus 


N! 


» (3, N,N; 
NN N 


) 


= NININ] РРР" 


Or, for a manifold population, 


М, 
EGRE, Y 


Р (¥ 
NN 


N! Ж Р 
т рр" рр QD 
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It will be recognized that this is merely a generalization of the 
formula for the binomial distribution. Its technical name is 
the “multinomial distribution" since it is the equation also for 
the terms of a multinomial expansion, that is, 


(pi + PF- + pj". 


The Multinomial Distribution. In the case of the binomial 
distribution, a sample could be described by a single percentage 
figure. For ү + м = 1; and as soon аз A was specified, the 
type of sample was fully designated. А graph of the binomial 
distribution could therefore be reduced to an ordinary two- 
dimensional graph. For a multinomial distribution, 
this is not possible, for there are at least two independent per- 
centages that characterize each sample. A graph of a multi- 


be multidimensional. 
Illustration of Symmetrical М. ultinomial Distribution. А con- 


graph of the multinomial distribu- 
: = 3, and pı = р» = Ps =$. Accord- 
ingly, the graph Tepresents the distribution of probabilities of 


fold population in which the percent 


classes are all equal. The equation f 


or this particular distribu- 
tion is 


"(p HN CUN (ta 

N'N'N MSNSNSAS) \3) (5 
ФН IW 
БЕЛАГА (4 


№, М. 
and the values of p e y №) given by this equation for 
" Ni N 
various values of 1? м, and Ns are recorded in Table 30. 
Since there are three class 
multinomial distribution is 
point in the three-dimension 


N al diagram represents a particular 
3 I- fas М: No N Ў А о Ns 
set of values for NUN Ww These combinations of M Ns, м 
values which 


are possible under the conditions of the problem 
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"T М ESENTING THE 

TABLE 30.—Т+ MULTINOMIAL DISTRIBUTION REPRESENTING T Н 
Р: 7: F5 Ea AN, = p: = р; = 

SAMPLE PERCENTAGES FOR Wuicu N = 5, k = 3, AND pi = p» = р i 


Type of sample NET 
Е UON срна à Ni N: Ns 
У F Y E м x N 
1.0 .0 .0 243 
mA 
.0 1.0 0 543 
T» 
9 9 150 243 
LO. 
5 ы 4 543 
EN 
8 .0 2 DE 
10. 
2 9 8 243 
5 
2 8 0 Bax 
5 
0 8 2 us 
5 
0 2 8 ome 
10 
6 4 0 Dus 
10 
$ 8 + 243 
10 
E 6 2i3 
10 
4 .6 0 as 
10 
E .0 .6 35 
, 10 
0 E! .6 343 
20 
:9 2 2 35 
20 
.2 T " m 
20 
: T d s 218 
30 
: "s 2 248 
30 
д > ; 213 
30 
E a | x L3 
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are marked by large dots. The probabilities of getting these 
various combinations, expressed in terms of zisds, are written 
slightly below and to the right or to the left of the dots. 


€ Ni Ns 
First, it will be noticed that only a limited number of м’ N’ 
la 
and ЕЕ combinations are possible. This is because m, Ns, and 
Ns 
23 must always add up to 1. The various values of m, ў’ 
T ition that 42 № _ 4 
and Ne thus subject to the condition that N + N + N 


This condition, as the mathematicians put it, restricts the 


“degrees of freedom” in selecting №, Ds, and т values. With- 


out it there would be three degrees of freedom (three dimensions) 
7 La 

x x and M values; with it the degrees of 

freedom are reduced to two (two dimensions). 


for the selection of 


Geometrically 


this means that the Nr Аң, and № values must be confined to 
NN N 


the slanting plane ABC, 
Na No N; ` 
и + м T WO № "BH 
emphasized here, because the concept of degrees of freedom will 
enter into most of the subsequent sampling analysis. If it is 


mastered now, the student should have little trouble with what 
is to come. 


The symmetry of th 
also be noted. Those 
distribution of cases 


the equation of which is the condition 


is particular aspect of the problem is 


€ given multinomial distribution will 
combinations that have the most even 
among the three classes (.4,.4,.2; .4,.2,.4; 


SAMPLING FROM A DISCRETE MANIFOLD POPULATION 315 


and standard deviations of the multinomial distribution will 
be of the same type as those of the binomial distribution. This 
is in fact the case. The mean or expected percentage of cases 
falling in class 1 is рь the mean percentage of cases falling in 
class 2 is p», and in general the mean percentage of cases falling 
in class k is рь. The standard deviation of the percentage of 
cases falling in class 1 is! 


— EV 
Y N 


the standard deviation of the percentage of cases falling in 


class 2 is 
р:(1 — p») 
бу, = e ` EEE 
з \ N 


and, in general, the standard deviation of the percentage of 
Cases falling in class k is 


p.(— Px) 
бу, = п аи 
у NON 


No general proof will be given of these equations. Their 

validity may be tested, however, by applying them to the data of 

"able 30 and by comparing the mean and standard deviation 

thus obtained with the mean and standard deviation obtained 

Y the use of conventional methods of calculation. 

Y definition the mean of a variable X; which has the relative 

frequencies p, is 
X = ZpiXi 


m mean value of N;/N (ie. the mean percentage of cases 
alling in class 1) is accordingly 


" Ni № Мз\ М: 
x= УР N'N'NJN 


Numerically, the value of X for class 1 can thus be obtained by 
multiplying the figures of the first column of Table 30 by the 


1р ө; + À . + 
For the binomial case the corresponding equation is 


Pd =P) _ «PP: 5 p=] 
ôy, ES = = N since pi + pe 
N 
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corresponding probabilities given in the last column. The results 
of these row-by-row calculations are given in the first column 


TABLE 31l.—CarcuLATION OF THE MEAN AND STANDARD DEVIATION OF 
PERCENTAGE OF Cases FALLING IN Crass 1 " 
(Multinomial distribution representing the sample percentages for which 
N = 5, k = 3, and pı = ps = h = 1) 


1 1 
243 243 
4 3.2 
213 243 
4 3.2 
243 243 
H 2 
213 243 
1 .2 
213 243 
6 3.6 
243 213 
ЪЁ 3.6 
243 243 
EB 1.6 
243 243 
EH 1.6 
243 243 
12 1.2 
243 243 
4 8 
243 243 
4 8 
243 243 
12 4.8 
243 213 
12 48 
243 243 
B 1.2 
243 243 
81 37.8 
243 | 948 
p UN EN. ر‎ 


of Table 31, and the sum of this column is therefore the mean 
value of N;/N. Consequently, X for N/N is Bq = 4. 
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By use of а short method the standard deviation of the sample 
values of N,/N (i.e, the sample percentages of cases falling in 
class 1) can be calculated from Table 30 by the formula! 


Ni № МАМА: a 

вы, = "E (S. vd) (¥) рі 
It will be recognized that the first term of the square root is 
merely the sum of the squares of the, percentages of column (1), 
Table 30, multiplied by the corresponding probabilities. Hach 
of these row-by-row products is given in column (2), Table 31, 
and their sum is the sum of this column, viz., 37.8/243. Hence, 


37.8 SEN 
8% = fe (55) = 


The same results are obtained by the use of the formulas; 
for X calculated above = 4, which equals p; = $. Using the 
equation for standard deviation, 


S нав). [08 - Я 
às = P о ш 


An Illustration of a Skewed Multinomial Distribution. Another 
example will now be used to illustrate a skewed multinomial 
distribution. 

" Uppose that there are again three classes, that the probability 
ee case falling in class 1 is р: = $, that the probability of a 
\ Ps in class 2 is p: $, and that the probability of а 
зы E ling in class three is ps = $; again suppose that five cases 

osen at random. Under these conditions the probability 


of ing i я i 
getting W cases in class 1, of cases in class 2, and ы cases in 
Class 3 is as follows: 


(пм) - 5! NYANA 
N N N) NUINSN,M) NJ) \6 


T E 
This is merely the short formula 


в = Voix وچ‎ 


F 
x = X and pt = RA 


where p ec Ns, Na 
NN N 
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А „М, № 
for various values of =>) =>) and 


ү ES N 
The values of P Ni Naf 3 NUN 


N'N'N 
m, are given in Table 32, and a diagrammatic representation of 
the distribution of probabilities is shown in Fig. 93. 


Fra. 93.—The multinomial distribution represented by the equation 
ее n 
p(X, Ns, Ns 


5! 5 Я 3 
N N NJ = мамами (OQ). 


(Probabilities are expressed in terms of »4ygths.) 


ji iion М №, М : 
Since the condition м + N + у = | holds for this Са5ё 


А № 
аз for the preceding опе, the various combinations of mx 


N'N 
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and x are again constrained to lie in the slanting plane ABC, 


that is, the degrees of freedom are only two. The distribution 
of probabilities is no longer symmetrical, however. Since the 
probability of a single case falling in the first class is greater 
than the probability of a single case falling in either the second 
or the third class, the multinomial distribution shows a major 


N ы a 
bunching of cases in the direction of the y-axis; and since the 


probability of a case falling in the second class is greater than 
the probability of a case falling in the third class, the distribution 
° H . № H 
Shows a minor bunching of cases in the direction of the Wois. 
All this is clearly revealed in Fig. 93. It is to be noted, how- 
ever, that the mean and standard deviation formulas continue to 
old true for this skewed form. 


Accordingly, the mean № equals pı = +, and the standard 


N 


deviation of oe equals 


Fan - (OO ve os 


and similar calculations will give the means and standard devia- 
tions of ЗЕ and m. 
Р Use of Multinomial Distribution in Testing Hypotheses. Like 
he binomial distribution, the multinomial distribution may be 
Used to test hypotheses regarding the true division of cases in a 
Population, Consider, for example, the following problem: 
woe there are three candidates for election to the same 
ce. An inquiring reporter stops five persons at random and 


asks which candidate they favor. The results obtained are as 
follows: 


Number of Persons Favoring 


Candidate Specified Candidate 
A 3 
B 2 


C 0 
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TABLE 32.—MULTINOMIAL DISTRIBUTION REPRESENTING THE DISTRIBUTION 
or SAMPLE PERCENTAGES FOR Wuicu М = 5, k = 3, AND p = % 
Pa = 3, AND р; =} 


~ Type of sample Я Probability. 
№ N: N: P G Na 8 
N N N N N F 
1.0 0 .0 EL 
7,776 
$5 і 32 
1.0 .0 7776 
.0 $ 1 
0 1.0 778 
8 2 .0 2510. 
7,776 
8 .0 2 „405. 
7,776 
0 8 2 2802 
7,776 
2 .8 0 .240 
7,776 
2 0 8 _15_ 
7,776 
0 2 8 10 
7,776 
6 4 0 1,080 
7,776 
6 0 E 270 
7,776 
0 6 4 _80_ 
7,776 
4 6 0 720 
7,776 
E 0 6 90). 
7,776 
0 4 6 40 
7,776 
6 2 2 1,080 
7,776 
2 6 2 480 
7,776 
2 2 6 120 
7,776 
4 4 2 1,080 
7,776 
4 2 A 810 
7,776 
2 4 4 360 


3 
E 
a 
о 
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On the assumption that the population from which this sample 
is taken is relatively large, do these results disprove the hypothe- 
sis that the population as a whole is equally divided with respect 
to the three candidates? That is, does the sample division of 
6, .4, and .0 disprove the hypothesis of a true division of . 33+, 
33+, and . 33+? 

To answer this question it is necessary, as in the twofold 
Case, first to decide upon a coefficient of risk of rejecting a 
hypothesis when it is really true. Let this be .10 in the present 
Instance. In other words, the investigating organization is 
[лш to run the risk of rejecting a true hypothesis 10 times out 
of 100, 

The second step is to derive the distribution of sample per- 
centages for samples of 5 from the assumed population. Since 
in the present instance the hypothesis is that pı = рз = ps = 3, 
the desired sampling distribution is that described by the multi- 
nomial distribution of Table 30 and Fig. 92. 

The third step is to select à group of unusual samples that 
may constitute а “region of rejection." In this particular 
Instance the distribution of samples is discrete and the number of 
Cases is small, so that approximation by a continuous distribution 
18 likely to be very inaccurate. It may accordingly be impossible 
to find any region that has a probability exactly equal to the 
adopted coefficient of risk. In the present problem it seems 
Ane bis do proceed as follows: In Fig. 94, where the plane 
lie of Fig. 92 is laid out flat, all the more unusual samples 
нй those with the lowest probabilities) are seen to lie on or 

eae of a given circle that is ruled double in the diagram. 
De probability of this group is .135. Although the region 
pen E of the circle and the area outside of it would thus 
itute a region of rejection with a probability greater than 
© adopted coefficient of risk, this probability is not much 
Breater, 
i. 20 be assumed that the investigating organization is 
might T ү to the other values of the population percentages that 
dl true, ї.е., that it is willing to run the same chance of 
peceni g the given hypothesis in whatever direction the true 
centa, ages might happen to lie relative to the assumed per- 
teje m Under these circumstances, а symmetrical region of 
ction, such as the circular area just described, should be the 


in 
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type of region adopted. In view of this situation, therefore, it 
will be assumed that the investigating body raises its coefficient 
of risk to .135 and adopts the given region as its region of rejection. 

"The final step in the analysis is to note that the given sample 
(3,2,0) does not fall in the region of rejection. "The investigating 
organization consequently concludes that the hypothesis of an 


(0.0.1) 
1 


Fie. 9. 


4.—T. А , NC А 
ъс п he plane of the multinomial distribution represented by the 


Ni No N ! 
(5 ыу ш 
N'NUN) 7 аа © 
(Probabilities are expressed in terms of zisds.) 


equal division of sentiment 
be rejected as a result of the 
If the hypothesis had been other than an equal division of 


In these instances, the 
ion presents difficulties 
e, however, for in very 
n as that assumed in the 
re used, it is possible to 
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make a mathematical transformation of a skewed multinomial 
distribution that makes it more symmetrical and at the same 
time permits a redescription of the distribution in terms of a new 
statistic that has a simple one-dimensional sampling distribution. 
This is discussed in the ensuing section. - 
ж; Ni — Np: 2 
The Sampling Distribution of the Statistic X BN DU NR ~ 
One of the important contributions of mathematical statistics 
has been the demonstration that when the set of all possible 
samples from a manifold population is described in terms of the 


Statistic (N: — Np)? , the sampling distribution is reasonably 

Well described, if М is large, by the x distribution. Hence, 

Instead of using the multinomial distribution for testing various 

hypotheses, it is much more practical in the case of large samples 

(N ;— Np)? 
N Е 


to calculate the sample statistic Б and use the x? 


distribution, for which tables are readily available. 

The complete argument by which this important conclusion 
15 reached is not given in this section. Nevertheless, multi- 
nomial problems arise so frequently and the x? distribution is 
Used as a substitute for the multinomial distribution in so many 
of these problems that it is well for the student to understand 

16 basis upon which this use of the х? distribution rests. The 
Argument, therefore, by which it is established that the statistic 


NUR has a sampling distribution of the form of the 
x distribution is discussed in a simplified way in the following 
Section, 
"à Simple Version of the Argument: the Symmetrical Case. In 
"lg. 94, the plane ABC from Fig. 92 is laid out horizontally ; the 
Sample points are all marked with heavy dots, and the coordinates 
the point are written beside each. In addition, the probabil- 
Wis 9f each is indieated. It will be noted that the point X, 
Ich was not shown in the original figure (Fig. 92), represents 
-'* point whose coordinates are the means of the N;/N values, 
.е., for this case, where pi = $, P2 = 4, Ps =%- Although 
in à Point Y is not a sample point,’ it is nevertheless important 
that it marks the center of the distribution. 
ic bat | cases, for example, had been chosen instead of five, the mean 
‘ould have also been one of the sample points. 
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It will be noted from Fig. 94 that, for the particular multi- 
nomial distribution represented, sample points of equal probabil- 
ity all lie equally distant from the mean point X. That is, 
points of equal probability lie in a circle with center at X. 
This characteristic of the distribution suggests that the set 
of all possible samples might be described in a somewhat simpler 
way, viz., by showing how the probability varies with the distance 
of sample points from the mean point X or preferably with the 
square of the distance, since the latter is easier to calculate. 

In pursuance of this line of thought, the following table of 
probabilities may be calculated from Fig. 94 and used in all 
practical problems as a substitute for the original multinomial 
distribution. The point (.2,.4,.4), for example, is distant from 


TABLE 33.—PROBABILITIES OF SPECIFIED VALUES or D? 


С | esting «apple pine 
point distant Бере mean 
«0206 g 35 = 0.370 
.1066- 50, = 0.247 
.1866 i = 0.247 
2466 2 = 0.123 
6666 EA - 0.012 


ee nee 9 
the mean point, X by! 


=P J (т -») 
eH GC -m 


Points (.4,.2,.4) and (.4,.4,.2) are e 
probability of getting а particula. 


qually distant from X. The 
r one of these three sample 


1 The general equation for measurin A 
i Е the square of th tween 
points (шугу) and (хуз, 2.) is e distance be 


Ф = (д — za Qi — y3* + (a — 2)? 
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results is 3Ë, and the probability of getting any one of them, 
ie. the probability of getting а sample point distant from the 
point by D = 4/.0266, is 3(30)/243 = тїт = .370. Similar 
caleulations give the other entries in Table 33. А more useful 
form of Table 33 is given in Table 34. This cumulates the 
probabilities of Table 33; it thus gives the probability of getting 
а sample point that is at least as far from the mean point as the 
given sample point. 


Tasty 34.—PnonaBiLPTIES OF D? 2 SPECIFIED QUANTITIES 


3i Probability of ap great or 
0266+ 243 _ 1.000 
. 1066+ E .629 
.1866 д8 = sm 
.2466-- E = 8B 
6666-- э = 0 


w N сы en 

"s advantage of Table 34 is the ease with which it can be 

a m test the hypothesis pi = р; = Рз cud If 185 is adopted 

tas coefficient of risk, then Table 34 indicates immediately 

Santee for which D? 2 .2466 would constitute a sym- 

Sean region of rejection having the probability .135. Hence, 
any sample, it is merely necessary to calculate 


»- XE- 


те note whether the sample D? falls above or below .2466. 
to cei. ч, longer need to draw à diagram such as Fig. 92 in order 
Possible off a two-dimensional region of rejection, for the set of all 
statistic p has now been described in terms of a single 
dimensie 2, whose sampling distribution (Table 34) is one- 
is ess From the way Table 34 was derived, however, it 
mather n that the upper .135 tail of the distribution of D* is the 
regio matical equivalent of the symmetrical two-dimensional 

n of Fig. 92. The simpler one-dimensional description 
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of the set of all possible samples thus accomplishes the same 
result as the multidimensional description. | 

The General Case. When the hypothesis gives to pi, р» and 
ps values that are not equal, the result is a skewed multinomial 
distribution. Generally, therefore, it is not possible to draw 
neat circles through points of equal probability, such as was 
done in the special case just considered; nor is it possible to 
redescribe the set of all possible samples exactly in terms of the 
distance D? from the point given by the mean percentages pi 
Pz, and рз. Fortunately, however, a certain procedure can be 
adopted that will give approximate results similar to those 
obtained in the simpler case. This procedure is as follows: 

In the more general case, considerable advantage is gained 
by changing the scale units. Instead of taking Ni/N, N»/N, 
and N;/N as the coordinates of a sample point, it is found 
convenient to take N;/4/Npi, N2/VNps, and N/V Nps as 
the sample coordinates.! "This means that the scales are modi- 
fied in proportion to Square roots of the hypothetical probabili- 
ties, the net effect of which is to make the distribution more 
symmetrical "The point represented by the mean percentages 
becomes, in terms of the new coordinates, p: ~/N/pi, p» VN /p»» 
and рз V/N/ps, or, simply, VNp VNp», and V/Nps; and the 
square of the distance between this point and any sample point 


becomes Y (a ed Npi) or Ў; (№; — Np)? 
v Np: Npi 


Owing to the approximate symmetry 
change in the scale units, a symmetry 
the value of М, it is possible to give an 
of the set of all possible samples in term: 


resulting from this 
that improves the larger 
approximate description 
s of 


na SON 
bit = У e .— 


When this is done mathematically, 
of probabilities is approximately 
where 2 in the x? equation is take 
minus 1. 


it is found that the distribution 
of the form of a x? distribution 
n equal to the number of classes, 


To illustrate this important conclusion, suppose that sampling 
is made from a population in Which the various items fall into 


! That is, the old coordinates are multiplied by 


VN pi- 
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three classes. Then on the assumption that the percentages 
of items in these three classes are pi р», and ps, the relative 
frequencies, or probabilities, with which various samples would 


have values of D”? = kj Qi > Np equal to or exceeding 
certain specified values are as follows: 


TABLE 35.—PROBABILITIES оғ О’ 2 SPECIFIED QUANTITIES 


Probability of as Great or 
Greater Value? 
(= Probability of as Great 


Values of or Greater x? for 
р" п=3-1 = 2) 
.0201 .99 
.0404 .98 
.103 .95 
.211 .90 
.446 .80 
.718 .70 

1.386 .50 
2.408 .30 
3.219 .20 
4.605 .10 
5.991 .05 
7.824 .02 
9.210 .01 
That is, in the set of all possible samples from the given popula- 


N: 


ti з ‚ — Np)? 
ton, the quantity О” = y о мру would have a value 


е to or greater than 5.991 in 5 рег cent of the samples, or à 
ue equal to or exceeding 4.605 in 10 per cent of the samples, 

and so forth, 
„ЕЁ original description of the x? distribution, n was said 
i ыва Lo the nature of the curve? Here n may be 
the e with the degrees of freedom. The reason for this in 
ке a problem should now be clear. . When the number of 
^ ae nto which a discrete population 1s divided is three, as 
n assumed in the above discussion, then the quantity of 


D? = (N: — Np)? 
рУ > has a sampling distribution of the form 
1 I 
à This is merely the row “n = 2" of a х? table (Appendix, Table уп. 
See p. 111. 


328 ADVANCED SAMPLING PROBLEMS 


of the x? distribution with n = 3 — 1 = 2. It will be recalled, 
however, that when the set of all possible sample points is 
described in terms of the percentages falling into each class, the 
various possible sample percentages must conform to the 


equation m xz m =F m — l. Geometrically this meant that 


all the sample points had to lie on a plane (the plane ABC of 
Fig. 92, for example). It was accordingly said that there were 
only two degrees of freedom for the selection of the various 
possible samples. Since for the case of three classes, n in the x? 
formula is equal to 2, this n becomes the same as the degrees of 
freedom in the given problem. 

This relationship also holds true for any number of classes. 
For the inevitable equation 18 да d... m = ]—inev- 
itable because the sum of all the class percentages must be 100 per 
cent—always reduces the degrees of freedom with which various 
possible samples may be selected from k to k — 1, while it is 


also always true that the quantity Р? = Se Мр) has & 
р: 


distribution approximately of the form of the x? distribution, 
with n in the x? equation equal to k — 1. Hence in problems 
of this kind the n in the x? equation is alwa ys the same as the 
degrees of freedom. 

Estimation of Population Percentages. Zones of Confidence. 
When, as in the case of the multinomial distribution and the 


distribution of Es (N: — Np)? 
Np; 


tion parameter to be estimated, the determination of confidence 
intervals for these parameters presents difficulties. In abstract 
terms, the problem is simple enough. Thus, for any given case, 


the .05 point, say, of the distribution of eu - Np)? 


› there is more than one popula- 


“ could 


be determined from a x? table (all that need be known for this 


is the value of s in the x? formula), and the Sie ee 
р: 
The resulting equation would 
ercentages p; that would just 
ess (assuming a coefficient of 
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Geometrically, the locus of the values of the ргѕ satisfying 
this equation would constitute a definite geometrical figure 
Such as an ellipse or, when there are more than three parameters, 
some sort of an ellipsoid. All values lying outside this locus 
would be considered unreasonable values, and all inside would be 
considered reasonable values. The enclosed area would consti- 
tute a “zone of confidence," and in repeated sampling it might 
be said that this zone would include the true value 95 per cent 
of the time. 

The difficulty that arises in trying to make use of the abstract 
analysis above is that the locus 
marking the zone of confidence is not 
à simple ellipse or ellipsoid but a 
much more complicated figure. In 
fact, practical determination of the 
values of p; constituting that locus 
Would involve so much trouble that 
16 is almost never undertaken. In 
Beneral, the investigator is content 
ee particular hypotheses that Fis. — a zone for 

of importance rather than 
attempting to class all hypotheses into those that are reasonable 
and those that are unreasonable. 

Mazimum-likelihood. Although determination of zones of 
Confidence for the values of the population percentages is thus 
not usually undertaken, single maximum-likelihood estimates 
Тау be readily made. For it is found that the values of pi, ps, 
Pa, etc., that maximize the probability of a given sample result 
are but the sample percentages Ni/N, N2/N, N s/N, ete. 


! This may be demonstrated as follows: If there are three class divisions, 
© Probability of a given sample point is 


Ni № № E N! Ni Na Na 
p ($; м’ н) алана № 


By the method of maximum likelihood, the values of pi, p», and ps are to be 
сод 80 that the logarithm of this probability is a maximum. The per- 
10 Ages Pi, p», and p; are not independent, however, for they must add up to 
Per cent, ie., pi + ps + ps = 1. In the process of estimation, there- 
Эг, let p, and p: be the independent estimates to be made, and let ps be 
etermined from p, and pe Under these conditions the maximizing values 
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Maximum-likelihood estimates of the population percentages 
are thus а very simple matter. 


APPLICATION OF THEORY TO SELECTED PROBLEMS 
Sampling Public Opinion. In sampling public opinion, the 
division of opinion is often threefold or more. In all such 


instances the foregoing theory becomes immediately applicable. 
Consider, for example, the following problem: 
In 1940 the election returns were as follows: 


Per Cent 
Е. D. Roosevelt, Democratic candidate............... 54.7 
Wendell Willkie, Republican candidate. . .. 44.8 
Other candidates... Te eese scs 0.5 


Suppose that in the fall of 1944 a given polling agency took a 
random sample of 1,000 voters! and found that 510 were for 
Roosevelt, again the Democratic candidate, 478 were for Dewey, 


of pi, рг, and p; must be such that the partial derivatives of the logarithm 
of the probability with respect to p: and р» must both be equal to zero. 
The logarithm of the above probability is 


My Ny N N! 
ЕЁ e "a ¥) ds Gram) а pi FNs log p: 


and the partial derivatives of this with respect to p, and рз, it being remem- 
bered that др;/др, = дрз/др» = —1, аге 


My _ м, _ 
Pi PC 
MN 
P2 po 


or Nips = Napi and Naps = Nip. 


_ Now add these two equations, and to each side of the total add NaP3, 
giving 


(Mi + М, + М№Мз)рз = Na(pı + P2 + ps) 


Since М, + №, + М, = М 


and pi + p» + pa = 1, this inea breve 
to denote an estimate) | ы Й gives (using 


"ES 

= ق 
Then from the two previous equations, it is found that Bi = Ni/N and‏ 
Pa = N2/N. Hence,‏ 


hood estimates of the 
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the Republican candidate, and 12 were for other candidates. 
Could it have been reasonably inferred from this sample that 
popular sentiment regarding the various parties had changed? 
In other words, would the hypothesis of 54.7 per cent Democratic, 
44.8 per cent Republican, and 0.5 per cent other affiliations have 
been a tenable hypothesis in view of the sample results? 

To answer this question the investigation proceeds as follows: 
First the statistic О” = (i ps is calculated. This 
gives, in the present instance, 

pr. (510 — 547)? , (478 — 448)" ү (12 — 5)? 
547 448 5 
Next it is noted that there are only three classes, so that Р”? is 
approximately distributed as x?, with the degrees of freedom n 
equal to 3 — 1 = 2. Finally a region of rejection is chosen. 
If .05 is taken as the coefficient of risk, a x? table shows that 
Or n = 2 the region consisting of values of D’? equal to or 
Breater than 5.991 is an appropriate region of rejection to select. 
In the given instance the sample D’? = 11.5 clearly falls in the 
region of rejection. Consequently, the hypothesis that there, 
аз been по change in political sentiment cannot be accepted. 
ince the maximum-likelihood estimates of population per- 
Centages are the sample percentages,” it follows in this case that 
€ maximum-likelihood estimates to be made of the true political 
Sentiment are 51 per cent Democratic, 47.8 per cent Republican, 
and 1.2 per cent other parties. 
Goodness of Fit of a Frequency Curve. The use of the x? 
Istribution in testing the goodness of fit of a frequency curve 
las already been discussed in a practical way in Chap. VIL? 


he theoretical basis of the method there outlined will now be 
Iscussed, 


= 11.5 


nd will be recalled that if №://Хрь №:/ VNp» cte., are taken as the 
Dol Mates of sample points, D' constitutes the distance from а sample 
D nt to the mean point 4/Npi, VNp» ete. (see р. 326). Accordingly, 
Sis 5.991 constitutes a cireularregion. This is unbiased with respect to 
er values of рур», and ps than those chosen by hypothesis in the sense 
have jo Probability of samples falling in this region is least when pi, P2, Ps 
р * the given hypothetical values and increases in whatever direction 
P2 and ра deviate from these hypothetical values. 
„ See p. 329. 


Зее pp. 137-152. 


that the 
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The procedure outlined previously called for the calculation 
of `y e where F was the frequency of cases in any 


class interval given by a sample histogram and f was the cor- 
responding frequency for that interval given by some theoretical 
frequency curve. It should be recognized now that this is 
BS MU, For N; is the sample 
Np: 
number of cases falling in any class and hence is the same as F 
in the other equation, and Np; is the number that would be 
expected to fall in that class if the sample total were divided 
in the same proportion as the population total and hence is the 
same asf. Since in the general discussion earlier in this chapter, 


merely a special form of У, 


bs e me is approximately distributed in the form of a x? 


distribution, so in the special ease of a comparison of a sample 
histogram with a theoretical frequency curve the quantity 


E-H о. Е | ——m 
j is distributed in the form of a x? distribution. 


There is one thing new in this problem, however, and this 
has to do with the determination of the degrees of freedom, 


^. When the theoretical curve that is fitted to a sample histo- 
gram is a normal curve, say, the theoretical probabilities pertain- 
ing to each class are not give 


n directly by the mere hypothesis of 
normality. It is first necessary to estimate the mean and 
standard deviation of the curve from the sample. The effect of 
this is to impose additional conditions on the sample frequencies 


that reduce the degrees of freedom from Ё (the number of classes) 
minus 1 to k minus 3. For in all cases there is the condition 
Ni 


Ni, М, Ns N; 


rd deviation of the curve equal to 
ation of the sample histogram imposes 


Setting the mean and standa; 
the mean and standard devi 
the additional conditions! 


1 It will be recalled that N; is the same as F and that p; 
It will also be recalled that when relative frequencies 


distribution is У A X and its standard 


£. 
is the same аз 77 


are used the mean of а 


deviation is үу (X – Ж)? 
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Ni n= У X; 
Ух pA; 


Ni г Xy = Y pan — Xy 
Ума-» т, dy 


This means that in the set of all possible samples (in this case, 
` Sample histograms) that might be drawn from a given popula- 
tion, only those samples will be considered for which the sample 
Percentages N;/N satisfy the above three equations. 
Probabilities are thus caleulated with respect to this special 
Broup of the set of all possible samples, and it is in this sense 
that the decrees of freedom are reduced. In general, when a 
frequency curve is fitted to a sample histogram and c parameters 
of the curve are estimated from the sample histogram, c + 1 
Conditions are imposed upon the sample percentages and the part 
of the set of all possible samples that is considered has k — c — 1 
degrees of freedom. "That means that > авы EU » or its 
equivalent in terms of the frequency distribution analysis, viz., 


F — f 
у EE has a sampling distribution of the form of the x? 


distribution with the degrees of freedom equal to k — c — 1. 
his is the explanation of the special value given to m in the 
Procedure described in Chap. VII. 
‚ Testing Independence. Still another use of the multinomial 
distribution and the x? distribution based upon it is to test 
Whether one classification is independent of another classification 
ог, Conversely, whether the two are associated. Particular 
Measures of association or correlation were discussed in Chap. 
АП, and tests of significance were developed for these measures. 
Е t this point, attention will be directed primarily to testing the 
“istence of an association, whatever its form or degree. 
ang sider a simple case. In general, the percentage of males 
is females in a large population tends to be the same, no 
“aa what the color of the people. That is, the sex ratio is 
“pendent of color, Within a limited area, however, this 
^Y not hold true, Economie and social forees in certain 
metropolitan districts, for example, may cause the percentage of 
a and other colored males to exceed considerably the per- 
tage of white males, and vice versa for females. Suppose a 
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sample of 1,000 people is taken from a given city with а view to 
studying the relationship between sex and color. For this 
purpose the 1,000 people are “cross classified” as follows: 


TABLE 36.—1,000 PEOPLE CLASSIFIED As TO COLOR AND SEX 


| Males Females | Totals 
380 | 320 | 700 
150 50 200 
70 30 | 100 
600 400 | 1,000 


The question is: Does this sample of 1,000 people demonstrate 
definitely that the population of the city аз a whole has a higher 
percentage of black and other-color males than white males, or 
can this sample result reasonably be considered the effect of 
chance? Or, to put it another way, does this sample show 
conclusively that the sex ratio in this city is not independent of 
color? 

In a problem of this kind, it is more convenient to test the 
negative, or "null," hypothesis, For if the classifications ае 
assumed to be independent within the population as a whole, it 
is then possible to tell something about the expected distribution 
of the cases, "Thus, if two classifications are independent, the 
distribution of cases in the various categories of one classification 
may be expected to be the same for each of the categories of the 
other classification. On the other hand, if the positive hypothe- 
sis of correlation is set up, nothing can be told about the expected 
distribution of cases unless the exact form of correlation- 18 
specified. Thus, in the absence of specific information regarding 


the nature of the correlation, it is customary to test the null 
hypothesis of independence. 


With reference to the illustratio 


n, the assumption of inde- 
pendence means that the 


for both sexes. That is, 
random (and hence the 
independent of the proba 
random, the probability of 
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the probability of picking a person of another color at random. 
Likewise, both the probability of picking a white person at 
random and the probability of picking a person of another color 
are independent of the probability of picking a male at random 
and the probability of picking a female at random. Under these 
conditions, the probability of picking a white male, for example, 
18, according to the multiplication theorem for independent 
attributes, the product of the probability of picking a male and 
the probability of picking a white person. Or, again, the 
Probability of picking a black female is simply the product 
of the probability of picking a female and the probability of 
Picking a black person. 

Symbolically independence may be described as follows:! 

One classification is represented by the roman numerals I and 
Tand the other by the arabic numbers 1, 2, and 3, then if the 
two classifications are independent, p: (and hence ри, which 
“duals 1 — pj) is independent of pi, р, and ps; and p: and ps 
(and hence Ps, which equals 1 — pi — p») are independent of pr 


TABLE 37.—Зумвомс ILLUSTRATION or INDEPENDENCE 


I II 
1 р. pin pi 
2 р, рп. pe 
3 pu рп, ps 
Pr pu 1 
and " а 
Ри. Hence, p, = pip,; Pr = PPa) р, = pip. ри, = pup; 


Ри, = Pup,; and ри, = рир.. 

m it is to be noted that the assumption of independence 
but P Sive the actual values of any of the above probabilities 
them ее States the existence of certain relationships among 
oh ithe b onsequently, if the actual frequencies to be expected 
it is ex asis of the assumption of independence are to be found, 
from ooo to estimate some of the underlying probabilities 
of the e Sample data (just as the mean and standard deviation 

Population were estimated in the previous example). 


M: 
, reference to 


USt as kn, 
al ow! 
bout the actual 


Table 37 may help the reader at this point. 
ledge that a population is normally distributed tells nothing 
distribution of probabilities but gives only its general form. 
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Thus, for the problem in hand, the probability of picking a male 
at random (p:) can be estimated as equal to the percentage of 
males in the sample (600/1,000 — 60 per cent), the probability of 
picking a white person at random (pı) can be estimated as equal Un 
the percentage of white persons in the sample (700/1,000 — 7 
per cent), and the probability of picking a black person at random 
(рз) can be estimated as equal to the percentage of black persons 
in the sample (200/1,000 = 20 per cent). Symbolically, these 
“estimating equations" are 


Nu + М, + Ny 


N "d 
Ni, lU - hi 
Ni, i = ф, 


or 


Nu + №, T Ni = Nor 
Ni, + Nu, = hun 
№, T Nu, = Nj» 


where the N;;’s refer to the sample frequencies in the various 
classes, N is the total number of cases, and the j's are the esti- 
mated probabilities. 

Estimates of the other class probabilities! can be computed 
from these three probabilities with the help of the relationships 
given by the hypothesis of independence and the laws for addition 
and multiplication of probabilities, 

Thus 
фи = 1 — ji = 100 per cent — 60 per cent = 40 per cent; 

bs = 1 — f: — Be = 100 per cent — 70 per cent — 20 per cent 
= 10 per cent; 
Du = Рф: = (60 per cent)(70 per cent) = 42 per cent; 

Pu = Pii» = (60 рег cent)(20 per cent) = 12 per cent; 

Pu = фз = (60 per cent) (10 per cent) = 6 per cent; 
Pu, = Pup: = (40 per cent)(70 per cent) — 28 per cent; 
Du. = Pups = (40 per cent)(20 per cent) — 14 per cent; 
and 
фи, = Pubs = (40 per cent)(10 рег cent) = 4 per cent. 


1 It makes no difference what cla: 


„лай 
7 Ss probabilities are selected аз the origin? 
three to be estimated directly fro 


m the sample data, 
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Applied to the number in the sample (N — 1,000), the fore- 
going estimated probabilities yield the following theoretical, or 
expected, frequencies for the various classes: 


TABLE 38.—1,000 PEOPLE CLASSIFIED AS TO COLOR AND Sex 


Males | Females | Totals 


420 280 700 
120 80 200 
60 40 100 


600 400 


na actual frequencies differ considerably from these expected 

гаа, and the question arises: Are these differences 
Ciently great to disprove the hypothesis of independence? 

S his is a problem involving estimated probabilities py. Like 
Previous problems, however, it can be shown mathematically 


that the quantity > (Wis — NBJ has а sampling distribution 


Njs 
that Е à E " 
hat is approximately! of the form of the x? distribution with the 


T de of freedom n equal to k — c — 1, k being the number of 
0 б апа с the additional restrictions imposed by the process 
‘omens the underlying probabilities. The number of these 
DE is always equal to the number of the class probabilities 
estimat;". estimated from the sample data (е, Ње number of 
e or equations). Another way of finding the degrees of 
or Pon to note the number of cells in the cross classification, 
аа ne table (such as Table 36) that can be filled in 
cel mad without changing the marginal totals. Since one 
Make the s left in each row and one in each column in order to 
Marginal ^UE as in each row or column add up to the given 
Vill yield m it follows that a table of r rows and c columns 
lie t И 1) (c — 1) degrees of freedom. 
Prom Tal Plem in hand, then, may readily be solved as follows: 
ables 36 and 38, it is found that 


'N a 

Y appro is the x^ distribution derived from the multinomial distribution 
nomia] d i e but it must be recalled that the derivation of the multi- 
TF segni itself is based on the assumption of sampling with 
u tinomial ae ar this is not true in practice, as is usually the case, the 
icien. ustribution gives merely approximate probabilities that are 
ulation, accurate only for samples that are small relative to the 


"Place; 
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a — NBs)? _ (380 — 420)? , (150 — 120)? , (70 — 60)? 
` es М: ы эт ' 39 
(320 — 280)? , (50 — 80)? , (30 — 40)? = 3244 
Ша ошаш t т 

Examining the x? table for n — 6 — 4 — 2, it is seen that 
х? = 32.44 lies far beyond the .05 point. Hence, оп the assump- 
tion that the upper .05 tail of the distribution is taken as the 
region of rejection, the hypothesis of independence cannot be 
accepted in this case; and it may be concluded that there is some 
association between color and sex in the given city. The general 
nature of that association is indicated by the sample. This sug- 
gests that the percentage of black and other-color males is greater 
than the percentage of white males, or, to put it another way, 
that the percentage of white females is greater than the per- 
centage of black females or the percentage of other-color females. 
Testing Homogeneity. Tests of independence that are used 
to determine the homogeneity of a sample are of sufficient 

importance to warrant special mention. 
that intelligence tests are given to 1,000 
tional university and also to 1,000 stude 
men students only. 


Suppose, for example, 
students in a coeduca- 
nts in a university for 
Suppose that the distribution of grades in 
university A is quite different from that of university B, the 
latter showing in general a tendency toward higher grades. 
Because of this result, authorities in В infer that they are getting 
a more intelligent type of student than university А. Author!” 
ties in A, however, claim that the comparison is not fair, since, 
according to their contention, the women students in A are not 
generally as intelligent as the men students and the sample from 
A is consequently not a homogeneous one, Fortunately, it 19 


possible to test the claim of the A authorities by applying the 
test of independence. 


Suppose that the 1,000 
classified with respect to i 
results as shown in the tabl 


The question to be answered by these data is: Is the distribu- 
tion of intelligence grades in university 4 independent of se*: 
or does sex make a difference? That is, can the distributions 
of men's and women's grades reasonably be taken to be tw? 
samples from a single homogeneous population? If they C97: 
then authorities in A are probably wrong: at least the test 0 


grades from university А are cross 
ntelligence grade and sex, with the 
€ on page 339. 
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TABLE 39.—1,000 STUDENTS CLASSIFIED AS TO SEX AND INTELLIGENCE 


GRADES 
Grades Men Women Totals 
91-100 26 27 53 
81-90 84 73 157 
71-80 120 108 228 
61-70 131 125 256 
51-60 95 95 190 
41-50 48 46 94 
31-40 12 10 32 
Totals 516 | 484 | 1,000 


Independence does not support their claim. If the distributions 
0f men's and women's grades cannot reasonably be taken to be 
two samples from a single homogeneous population, then the 
authorities in А are right and the combination of the men's and 
Women's grades produces a heterogeneous group that is not 
Comparable with the presumably homogeneous group of grades 
rom university B. 

hat do the data show? To answer this question would 
require a, repetition of the sort of numerical work carried out in 
© Previous example. The reader is therefore left to find out 

© answer for himself. He will find it а good ехегсіѕе.! 
^UTION: This chapter may well be concluded with a note 
9 „Warning. Ttis always to be remembered that, when a hypoth- 
esis is nog disproved, it is not necessarily proved. The x? test of 
te odness of fit affords a good illustration of this. As already 
à leated, if a normal curve gives a good fit to a sample histo- 
uc m, it is no proof that the sample came from а normal popula- 
fias Any other curve that would give approximately the same 
Oretiea] frequencies as the normal eurve would be an equally 
е hypothesis, Furthermore, it is to be noted that the x? 
A. akes no account of sign. A histogram may differ from a 
ik e m a negative manner on one side of a central point and 
в е manner on the other. Still, if the differences are 
ла, the x2 test may not reveal this obvious lack of conformity. 
sels YPothesis is not disproved by one test, it may thus be dis- 
Oved by another. It is therefore well to examine a hypothesis 


TO = 
™ all angles before accepting it, even tentatively. 


1" 
"ini Procedure is to set up the hypothesis of independence and from this 
hat = the marginal totals to calculate a set of theoretical frequencies 
ay be compared with the actual frequencies by the x? test. 


CHAPTER XIV 


JOINT SAMPLING FLUCTUATIONS 
IN MEAN AND STANDARD DEVIATION 


Previous sampling analysis has concentrated attention on 2 
single statistic and a single population parameter. Hypotheses 
regarding that parameter were tested and confidence limits 
established. Sometimes, however, problems arise in which two 
or more parameters are involved. For example, the question 
might be asked whether a given sample could have come from à 
normal population with a Specified mean and a specified standard 
deviation.! A problem might also require joint confidence limits 
for the mean and standard deviation of a normal distribution. 
It is with these problems that the present chapter will be con- 
cerned. Similar questions could be asked with respect to normal 
bivariate or multivariate populations or to nonnormal popula- 


tions in general, but these more difficult problems will not be 
discussed here, 


DERIVATION OF JOINT SAMPLING DISTRIBUTION OF MEAN 
AND STANDARD DEVIATION 


To answer questions about both the mean and the standard 
deviation of a normal population it is necessary to know the joint 
sampling distribution of the sample mean and standard deviation- 
This section will discuss the derivation of such a joint distribution- 

1 Previous analyses of sam 


> : n 
COTE ‹ id the sample come from 
some population in which the mean Ваза specified value? Here the dis 


required the use of th 


In the present problem both th iation аге 
specified by hypothesis. © mean and the standard deviation 


340 


JOINT SAMPLING FLUCTUATIONS 341 


Numerical Illustration. For samples of 2 from a normal 
Population, with X = 100 and в = 10, the joint distribution 
of the sample mean and sample standard deviation may readily 
be found from Fig. 93 (page 318). To find those samples, for 
example, whose means lie between 95 and 97 and whose standard 


$o x 9 


С E G B 
of ! 
aca r аа سبلل‎ 
С 0 90 $ X 100 105 
16. 96.—The sum of probabilities in cell abcd is equal to 
494.5 + (0494.5) (рот), G)(475.2) | (57.4) _ 9817 
100,000 + “100,000 100,000 100,000 " 100,000 ~ 100,000 


The s р 

Xe Sum of probabilities in cell a’b’c’d’ is the same, Hence the sum in both 
together is twice 281-7 zn 1963 _ 
» 100,000 100,000 
“pressed in —1 _ th 
100,000 “°S: 


The probabilities in the figure are 


deviant; s 
ig ations lie between 2 and 4 (variances between 4 and 16), it 

merely necessary to find the samples that lie between lines 
Ii. 229 A'B’ and also between the lines GH and EF and the 


'"H' and ЕЕ’. "The procedure is illustrated in Fig. 96.‏ اا 
and pi the samples sought are those lying in squares abcd‏ 


The probability of these samples, as given by 


Fi Е 
8. 93, is the probability of a sample with a mean between 95 
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and 97 and a standard deviation between 2 and 4. This prob- 
ability can be entered in a joint distribution table such ав is 
pictured in Fig. 97: When this has been done for all possible 
combinations of values for the mean and standard deviation, the 
result is that shown in Fig. 98. Figure 98 depicts the joint 
distribution of the mean and standard deviation of samples of 2. 
For larger samples the joint distribution of the mean and stand- 
ard deviation looks more like Fig. 99. 

А study of Fig. 98 will reveal that the distribution of sample 
Standard deviations is the same in form, whatever the mean 


с 
14 9 
8 м 
10 


оо m ^ o o 
а | | 


Fic. 97.—' The probability of samples with a mean between 95 and 97 and ? 
standard deviation between 2 


2 and 4. (The probability in the figure is іп 
TF ths.) 
or samples with means lying between 115 
he probability of а sample having a stant 
tween 0 and 2 is equal to 198.6/100,000, 
‘ample having a standard deviation 15108 
3.6/100,000, the probability of a sample par 
115 and 117 and a standard deviation lyin£ 
ete. For samples with means 
ities of various standa” 


-2/100,000, 93.5/100,000, 79.9/100,000, et^ 
These are Proportional to the former a A s 


, That is, 198.6/101.2 = 183.6/035 P 156.8/79.9 = ete., which 
indicates that the two distributions of sample standard devi” 
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tions аге the same іп form. This is true for all columns of Fig. 98, 
indicating that the distribution of sample standard deviations is 
Independent of the value of the sample mean. This is true in 
general. For a normal population the sampling distribution of 
Sample standard deviations is thus independent of the value 
of the sample mean, whatever the size of the sample.’ | 
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Fig, 
98.—Probability set of sample means and sample standard deviations. 
The X(P) = 100,000. N =2,” = 1. 
Sample 8ا‎ standard deviation being independent of the 
any A the joint probability of any given sample mean and 
two in di Sample standard deviation is just the product of their 
tri utio vidual probabilities. The equation for this joint dis- 
ter X n is therefore the product of Eqs. (1) and (3) of Chap- 
Standar ccordingly, the joint distribution of the mean and 
€viation is given by 


1 
See A, | 
PPendix to Chap. X, p. 263. 
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USE OF JOINT SAMPLING DISTRIBUTION OF MEAN AND 
STANDARD DEVIATION 

Testing Hypotheses. The use of the joint sampling distribu- 
tion of mean and standard deviation for testing hypotheses ne 
be illustrated with reference to Fig. 99, which gives the join 
distribution of the mean and standard deviation for random 
samples of 11 from a normal population in which the age > 
equal to 100 and the standard deviation is-equal to 10. of 
question to be considered will be this: If a cêrtain sample zB 
11 cases has а mean of 95 and a standard deviation of 13, is 1 
reasonable to suppose that it came from a population whose 
mean is 100 and whose standard deviation is 10? 4 

Choice of Region of Rejection. In answering this question 
the first step is to adopt а coefficient of risk that will determine 
the risk of rejecting the hypothesis when it is true. Let this be 
Set at .05. The second Step is to choose an appropriate region 
of rejection. This requires careful study. 

Suggested Regions. Various regions of rejection suggest them- 
Selves. First are regions of rejection based entirely on не 
sample mean. Figure 100 Shows a balanced region of this kin 
(call this region Ia), and Fig. 101 shows a region covering oniy 
the lower mean values (call this region Ib). A third region О 
this sort might, cover only higher mean values but is not here 
depicted by a figure (call this region Ic), 

А similar set of regions could be base 
Standard deviations. Figure 102 shox 
this kind (call this region IIa), 
covering only larger Standard devi 
IIb). A third region (not shown 


d entirely on the sample 


1 For the standard deviati ‘02 and .98 points of the x? distribution 
are used because the table of x? is so set up that these values, and not the 
:025 points, can be obtained; t. 
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A third set of regions could be based on the statistic 


12 ۷¥ - 3 
Č 


which involves both the sample mean and the sample standard 
deviation.! To note how such regions are set up, consider the 
following relationships: In Fig. 99, X — X is the distance of the 
Sample from the vertical line through the population mean 


t-g N : 
Value X = X, Likewise, ¢ is equal to с 4 NY and is thus 


equal to NE i multiplied by the distance of the sample 


from the horizontal axis, i.e., the X-axis. 


The statistic VN (А =x) 
ГД 


Sent of the angle that the line connecting the sample point with 
€ point (X,0) makes with the horizontal axis (angle В of 
Fig. 104), 
A balanced region of rejection based on the sample value of 
le MO CE =F 
ó 


is thus proportional to the cotan- 


would thus look like that marked off in 


Fig. 104 (call this region IIIa). A similar region based solely 
On negative values of t is shown in Fig. 105 (call this region IIIb). 
third region could be based solely on positive values of ¢ 
ЭЧ is not here depicted by a figure (call this region 1110). 
fourth type of region that is considered (call it region 
«у 15 a region based on a special set of contours known as 
COntours "2 
he quantity X is called the “likelihood ratio." The likeli- 
ofa sample, it will be recalled, is the probability of that 


5 ж 
ample on the basis of certain assumed values of the population 
is (у — 

Чуу And not :05. While the regions of rejection based on the standard 


ion are thus not strietly comparable with those based on the mean, 
E Position is good enough to illustrate the idea of regions of rejection. | 
18 the optimum estimate of в, based on the sample standard deviation 


ang: 
nd is equal to ¢ \/ 


2 Уи | 
m Nevatan, J., and Е. S. Pearson, “On the Use and Interpretation of 
ol ES est Criteria for Purposes of Statistical Inference,” Biometrika, 
; А (1928), pp. 175-240. 
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parameters. The likelihood ratio that Profs. Neyman and 
Pearson suggest is the ratio of the likelihood of the given sample 
on the assumption that the population has the mean and standard 
deviation given by the hypothesis in question to the likelihood 
of the given sample on the assumption that the population mean 
and standard deviation are equal to the joint maximum-likelihood 
estimates of these parameters. This ratio is given by 


ме _ apes 
х = S%e 2 (2) 
or 


logio А = «d loge S? — (M2 + S? — 14348) (3) 


X 


Where M = 2 › 8 = 2 and .4343 = logio е. 


А Equation (3) may be written more succinctly as follows: 
N 
logi ^ = 5 (4343 — E) (4) 
i ; 2 
n which в = 4343(M? + 8?) — logio S? (5) 
This is the equation for the ^ contours. 
TABLE 40.—ILLUSTRATING THE CALCULATIONS NECESSARY FOR 


GRAPHING -CONTOUR REGION OF REJECTION 
(Region IV) 


s i (2) (3) m (5) © e (8) 
z А og o? 4 Bj = of X= 
PM M logo | 196050 тз = ЖО | Женю 
e 30.25 1.4807 1757 40.46 10.21 | 53.3 | 103.2 | 96.8 
"a 36.00 1.5563 .2513 57.86 21.86 | 4.7 | 104.7 | 95.3 
8/0 49.00| 1.6902| .3852| 88.69 39.69 | +6.3 | 106.3 | 93.7 
9.0 64.00| 1.8062, .5012 115.40) 51.40 7.2 | 107.2 92.8 
10.0 81.00) 1.9085| .6035| 138.96) 57.96 | 7.6 | 107.6 | 92.4 
110 10000 2.0000) .6950| 160.03| 60.03 | +7.8 | 107.8 | 92.2 
120 121.00 2.0828) .7778 179.09| 58.09 | +7.6 | 107.6 | 92.4 
13:0 44.00) 2.1584| .8534| 196.50| 52.50 | +7.2 | 107.2 | 92.8 
14 168-00 2.2279 .9229 212.50, 43.50 | +6.6 | 106.6 | 93.4 
15.0 96.00] 2.2923) .9873 227.33 31.33 | +5.6 | 105.6 | 94.4 
15 225.00] 2.3522| 1.04792) 241.12| 16.12 | +4.0 | 104.0 | 96.0 
240.25 2.3807 1.0757 247.68 7.43 | +2.7 | 102.7 | 97.3 
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The calculations shown in Table 40 are solutions of an equa- 
tion derived from Eq. (5). Ву interpolating in Table 41 
(page 361) for the value of k, for which the P, is .05, k is found 
to be.695. Hence, by Eq. (5), if в equals 10 and X equals 100, 


H (X —100* „г В M 
695 = 4343 Е + 100 | — log e? + log 


and since (X — 100)? = 42. 
: .695 = .004343:2 + .004343 о? — log o? + 2 


zu " 1 » "m 
== gh 1001313 (log в" — 1.305) 


v log o? — 1.305 
X =1 ey o e og 
00 + 004343 m 


This last is the form in which solutions are found for X cor- 
responding to the Specified values for ¢ shown in column (1) 
of Table 40. The corresponding values for X are shown in the 
two subcolumns of column (8). The graph of columns (1) and 
(8) gives the elliptical region depicted in Fig. 106. This is ® 
picture of region IV, which is а M-contour region. ; 

À final set of regions that тау be suggested is a type of region 
that cuts off he distribution of sample means 
Such regions are pictured in Figs. y 
ed, as a class, regions Va, Vb, and Үс 

‘ us Regions of Rejection, The reason 
Why so many regions of rejection are Suggested as possible alterna" 
St region to choose in any particular instance 
cumstances of the problem. The геро 
lose likely, under var ring circumstances, 
to be found most useful, Their relative one har and dis- 
riefly be considered, 1d 

egions havin { ility of .05 woul 

lead to a rejection of the hypothesis 5 = =y E when 1 
egion I has the disadvantage that it would perm! 


а ET 
the acceptance of hypotheses ; “ disadvantage nf AGE oy 
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might presumably be thought to avoid this disadvantage of 
regions T and II of including improbable samples; but unfortu- 
nately this does not follow. As may be seen from Fig. 104, 
region ПТ accepts the hypothesis in cases where the standard 
deviation is very large because the mean happens to be very 
Small or very large at the same time, although both are very 
unlikely on the basis of a given hypothesis. Also, region III 
accepts the hypothesis when the sample standard deviation is 
Very small because the mean is also very close to the hypo- 
thetical value. 
8 Region IV avoids the disadvantage of accepting highly 
Improbable hypotheses in that it has a certain elliptical sym- 
metry about the center of the distribution of sample means and 
Variances, 

І All these regions, it should be repeated, will lead to the rejec- 
tion of the hypothesis when it is true 5 per cent of the trials; but 
only regions like IV will lead to rejection of the hypothesis in 
every: case in which the sample itself is very improbable on the 
vasis of that hypothesis. 

; lore important, however, than the criterion that a region 
should include the more improbable samples is the criterion 
i à region should lead to rejection of the hypothesis most 

quently when the hypothesis is not true. Which region fares 
wn this more important criterion will depend on the nature 
Bus © problem in hand. If the investigator desires to avoid 
of design of the hypothesis especially when the true value 
те е mean is less than the hypothetical mean being tested, 
gion Ib should be used, for this will lead to rejection of the 

Pel esis more often when the true mean is less than the 
retical mean. Region IIIb is also a good region to use 

* i. case, but it does not appear to be as good as region Ib, 

Sed only on the means. 
"ea the other hand, if the investigator wishes to avoid accept- 
мы of the hypothesis especially when the true value of 6 
ar б ы than the hypothetical value, region IIb should be used; 
its y s 8 is increased, the distribution is stretched upward and 

ч ertical mean moves higher. In this instance, region IIIa, 

Certain values of ê, and region 110, for all high values, 


1 F » 
Der OF as the distribution is shifted to the left by decreasing X, a larger 


се ч Е pier; 
*ntage of the samples tends to fall in the established region of rejection. 
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are worse than useless, since if the true 8 were greater ү 
the hypothetical 8 the probability of a sample falling in t i 
region of rejection would be less than if the hypothetical 
were the true à. Thus, if region Пс is used under these circum- 
stances, there is more chance of accepting the hypothesis when 
16 is not true (because в is actually larger) than there would be 
if it were true. On the other hand, region Па, for certain values 
of ê, or region Пс, for all lower values, is not so bad in cases in 
which the true 6 is less than the hypothetical в. 

Instances in which region Па or ПЬ а 
useful are those in which the investi 


true ê is less tha; 
disttibution is distorted so that m 


in the established region of rejection. 


a compromise region that tends to give a high 
probability of rejection in whatever manner the true means and 
standard deviations differ from their hypothetical values. For 
such instances Region TV has been offered as an especially good 
region. 

The region of rejection 
would probably be one th 


xample, would be concerned about getting 
ss mileage on the average than desired and 
Of course, other cases would arise 
gator would wish to avoid acceptance of the 
fact the mean was greater than the hyp” 


е standard deviation was greater or less: 


It is believed, however, that the former instance is more common- 
А region bounded by 


à smooth continuous curve would me 
to be the best type of region to adopt for the reason explaine 
in the preceding Paragraph. Such a region is depicted т 


Fig. 107. But to find a curve that would permit the ready 
calculation of probabilities j 
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peng involve the same difficulties as one like that shown in 
ig. 107. 


т " 
‘ABLE 41.—PnoBABILITY or A VALUE OF À AS GREAT AS, OR GREATER THAN 


N 
Specimen Varus ror Various VALUES or k = 4343 — g log A* 


N=3 N=4 N=5 
ii Py k Py k Py 
" 50 .0830 1.25 .0578 1.05 .0568 
.0415 1.30 .0486 1.10 .0450 
; 40 .0106 1.50 .0243 1.40 ‚0113 
70 .0053 1.80 .0086 1.45 .0090 
N=6 N=7 N=8 
Е и Pr k Py k Py 
ы 
-20 .0666 .85 .0551 .80 .0512 
j -95 .0499 .90 .0389 .85 .0341 
r2 .0117 1.05 .0137 1.00 .0101 
25 .0088 1.10 .0097 1.05 .0068 
N=9 N = 10 N=11 
^ s Py k PX k Py 
sea 
P| .0534 ‚70 ‚0625 ‚65 ‚0822 
180 .0336 .75 .0371 70 .0461 
E: .0133 85 0131 .80 .0145 
à . 0084 .90 .0078 .85 .0081 
* Abri 
nter Sed from more elaborate tables in J. Neyman and Е. S. Pearson, “On the Use and 
ol tation of Certain Test Criteria for Purposes of Statistical Inference," Biometrika, 


20 
(1928), рр. 175-240, 238-240. 


ex trations, The two regions whose use will be illustrated 
бө Щщ be region IV, based on the № contours, and region Vb. 
instan regions are the ones that would probably be used in most 
Would p. When a joint hypothesis was being tested. Region IV 
actual e used when the investigator is indifferent to what the 
When ONE of X and в might be and region Vb would be used 
he Investigator is especially troubled about these values 


eviati Н : nappe 
ating from the hypothetical values in à particular direction, 
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Regions based entirely on either the mean or the standard devia- 
tion alone would probably not be used in cases in which а joint 
hypothesis is being tested. It is unlikely that the investigator 
would be at all concerned about the actual value of the standard 
deviation, say, when the hypothesis does involve this quantity. 


|.* 
TABLE 42.—VALUES OF THE RATIO P,/d ков VARIOUS VALUES OF k 


P. 
k „> k ^ | k x 
435 1.0008 | 1.0005 | — .605 1.2021 
440 1.0062 1.1024 | — .610 1.2089 
.445 1.0116 1.1084 .615 1.2155 
450 1.0170 1.1144 .620 1.2221 
.455 1.0224 | 1.1205 | 695 1.2287 
.460 1.0279 | 1.1200 | 630 1.2353 
.465 1.0334 1.1328 635 1.2420 
ATO 1.0390 .555 1.1390 | бло 1.2488 
475 1.0446 .560 1.1452 | .645 1.2558 
480 1.0502 .565 1.1514 650 1.2623 
.485 1.0559 570 1.1576 | — .700 1.332 
.490 1.0616 .575 1.1639 .750 1.407 
.495 1.0673 .580 1.1702 | 800 1.485 
.500 1.0731 .585 1.1766 .850 1.569 
.505 1.0789 .590 1.1830 .900 1.658 
ED 1.0847 | — 595 1.1894 .950 1.758 
.515 1.0906 | 600 1.1959 1.000 1.855 


"able ХИ in J. Neyman and E, 8. Pearson, “On the Use and Inter 


pretation of Certain Test Criteria for p Statisti e ” Biometrika, Vol 
20A (1928), pp, 175-240, gg i ® for Purposes of Statistical Inferenee," Biometri 
The Probability Distribution of X. То make use of \ contou! 


for regions of rejection, the probability distribution of must be 
determined. This ha | 


апа the results 


are condensed in Tables 41 and 42. Por М = 3 to N = 11 and 


„ог various values of k, the probability P, of getting as great O" 
ance is given in Table 41. These hav? 


elaborate table worked out by Ney: man 


and Pearson. For lar, 7 "un ul. 
and Pe ‚ arger Values of N, Table 49 is found usef 
l'his gives, for various val 


of relationships that is pre 
1 Tt will be a: value 
Tt will bc Probability of as great or greater V alu 
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Testing a Hypothesis with a № Contour. То make use of these 
fundamental relationships in an actual problem proceed as 
follows: Calculate M = go and S — = апа 


k = .4343 (М? + 8?) — logio S7. 
TEN is equal to or less than 10, use Table 41 to note whether the 
^ mn N 
probability of as great or greater ^ [given by log\ = э (1943 — | 


8 less than or equal to .05 (or .01 if this is taken as the coefficient 
of risk), If it is less than or equal to the coefficient of risk, then 

€ given sample must fall in the region of rejection marked off by 
the .05 (or .01) А contour. If М is greater than 10, find from 
Table 42 the value of Р,/^ corresponding to the value of k com- 


x 
Puted for the sample. 'Then compute log ^ — X (4343 — k), 


and multiply P4/4 by this value of X. The result is Pa, the 
Probability of as great or greater value of A. If this is less than 
ки adopted coefficient оѓ risk, then the sample falls in the region 
rejection marked off by the ^ contour. 
hot illustration this procedure may be applied to the problem 
: ated above on page 344. In the sample, N = 11, X = 95, and 
= 13. Itis desired to test the hypothesis that the mean of the 
> irr from which this sample was drawn is 100 and its 
han deviation 10. 'The investigator is indifferent as to 
8 cond the actual values of the population mean and population 
ivel ard deviation are greater or less than 100 and 10, respec- 
Y, and he decides upon a coefficient of risk of .05. 
tor = 
For these data, M — nen = —.5, and S = 13 = 1.3. 
M? = 95, 5° = 1.69, and 


k = (.4343)(.25 + 1.69) — -2279 = .6146. 


Look: 

a up this value of k in Table 41 it is found that, for N = 11 

a 05, Р, = 082. Hence, for Б = 6140, P» must be 
er than .08 and certainly greater than .05. Accordingly, 


Hence, 


ОГА. Тр; = 

of X Tit had been written P(x) it would have meant simply the probability 

та Thus Py is the probability of а range of values beginning at А and 
Ing to infinity, 
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the sample in question does "a fall in the region of rejection, and 
pothesis is not rejected. | 
57 = 11, Table 42 could have been used (^ 2 
Table 41. Interpolation in Table 42 shows that, for k =. 2 
P,/X = 1.1969. For this given value of k, the value of log ed 
C=) (4343 — .6146) = —.9922, or 9.0078 — 10. This d ч 
Х = .1018. Multiplying the value of Р‚/Х by this inc ts 
gives Ру = (1.1969)(.1018) — .1218, which corroborates 
previous conclusion that P, was greater than .08. T 
The caleulations in Table 40 have illustrated the constructi А 
of а -contour region of rejection for the testing of a aro 
as to the mean and standard deviation of the population. | т 
resulting A-contour region of rejection (region IV) is graphica d 
depicted in Fig. 106. It will be noted that the X contour E i 
symmetrical ellipse with the intersection of à and X as its cen ims 
Table 40 and Fig. 106 are presented as an aid in visualizing i 
principle involved in these problems; such a table and = 
need not, of course, be constructed for an actual problem. h 
problem illustrated above was solved without the use of suc 
a figure. i 1 
Use of a Corner Region Illustrated. The foregoing has as 
based on the assumption that the investigator is indifferent as 


"Ж ге 
whether the true values of the mean and standard deviation 9 
greater or less tha; 


investigator is mor 
true mean is less, в 
and that the true 
thetical standard 


the distribution of samples. Logically, it would seem арр 
priate to find the X contour that marked off а .05 gen 
in this Upper quadrant, Unfortunately, this presents d 
great mathematica] difficulties that the procedure is impra? 


. r = Е A rude 
ticable. The following discussion is therefore offered as a crud 
substitute: 


А : NN The 
1 The ellipse obtained in this Way is different from that of Fig. 111. i dim 

latter gives joint values for the Population mean and standard eee 

for given values of the sample mean and Sample standard deviation 

p. 369), 
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The determination of a corner region of rejection may be 
explained with reference to the following problem: Suppose it is 
claimed that a new method of cultivating potatoes in a given 
locality will average a yield of 100 bushels per acre and a standard 
deviation of 10 bushels. Suppose that this offers a gain over 
the old method of cultivation in that the mean is greater and the 
standard deviation is less. To test this claim a farmer uses the 
new method of cultivation on 10 plots of ground of the same 
Size and finds that the mean yield is 97 bushels and the standard 
deviation is 13.5 bushels. Can he reasonably reject the hypothe- 
Sis that the average yield will in the long run be 100 bushels and 
the standard deviation 10 bushels? 

f the true mean of the population is 100 bushels and the true 
standard deviation is 10 bushels, 50 per cent of samples of 10 
Will have an average yield equal to or less than 100. Likewise, if 
the true standard deviation is 10 bushels, 50 per cent of samples 
of 10 will have standard deviations equal to or greater than 9.15.* 
| Ccordingly, the probability of a sample of 10 having a mean 
ess than 100 and a standard deviation greater than 9.15 is .25. 

f two other values for the mean and standard deviation can 

е found such that 20 per cent of the samples have means lying 
etween 100 and this second mean value and standard deviations 
Ying between 9.15 and this second standard deviation value, 
e. Second values for the mean and standard deviation will 
^x off а .05 region in the upper quadrant that would appear 
aad a good region of rejection for the present problem. The 
one Те root of .20 is 4472, Therefore, if a second mean value 
Bic € found such that .4472 of the samples have means lying 
am this value of 100 and a second standard deviation value 
м be found such that .4472 of the samples have Standard 
map d lying between this second standard deviation value 
eter 15, the two values for the mean and standard deviation so 
Mined will mark off the desired region of rejection. 
Жош ГЕ of normal probabilities shows that .4472 of the samples 

: all between the mean and the mean less 1.6180. Since 

Standard deviation of the mean is 6/V Ñ, it would have 

fall ыы 10/4/10 = 3.16. Hence, 4772 of the samples would 

р tween 100 and 100 — 1.618(3.16) = 94.9. 

p et E5, probability is 50 per cent that a x? will exceed 8.343; there- 
tha iy is 50 per cent that Nc?/8? = 1002/10? will exceed 8.343 or 
ill exceed 4/8343 = 9.15. 
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Similarly, a table of the x? distribution! shows that, for 
n — 9, .4772 of the cases lie between the median value 8.343 
and the value x? = 16.774. Since № 7/д? has a sampling dis- 
tribution of the form of the x? distribution, this means that 
-4472 of the samples will have a с lying between 9.15 and 12.95; 
for 102/100 = 16.774 gives с = 12.95. 

Accordingly, the desired .05 region of rejection will be .such 
as the shaded area of Fig. 110. Tt will include all samples whose 


8 — 930 $ 00 
«5 x 


Fic. 110.—An example of a corner region of rejection, 


105 110 ПБХ 


5, it falls in this region of rejection and 
the hypothesis must be rejected. 


Determination of Joint Confidence Zone for the Mean and 
Standard Deviation, 


; and the probability of its including 
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the true value of this parameter was called the “confidence 
coefficient." 

In a similar fashion, when two parameters are being estimated 
jointly, it is possible to lay off joint ranges of values that may be 
Said to include the true population values of these parameters 
with a given degree of probability. This joint range of values is 
called a “confidence zone," and the probability associated with 
it is its confidence coefficient. 

It will be recalled that the upper limit of the confidence 
interval for а single parameter was taken as the value of the 
parameter that would make the probability of the given sample 
result or a lower value just equal to a predetermined figure, say 
025. Likewise, the lower confidence limit was the value of the 
parameter that would make the probability of the sample result 
ога higher value just equal to another predetermined quantity, 
again say .025. The sum of these two probabilities is equal to 
the selected coefficient of risk, and the complement of the sum 
is the confidence coefficient, say .95. 

Another way of looking at it is that the upper confidence 
limit is so chosen that the sample would fall exactly on the upper 
boundary of the lower region of rejection, and the lower con- 
fidence limit is so chosen that the sample would fall exactly on the 
lower boundary of the upper region of rejection. 

In order to determine a joint confidence zone for two param- 
eters а similar procedure is possible. The limits of the zone are 
determined such that if the population parameters have any 
of these limiting values then the sample will fall on the boundary 
of the region of rejection. In the case of the mean and standard 
deviation of the population a confidence zone with a confidence 
coefficient of .95 would be determined by finding the values of 
these parameters that would make the given sample fall on the 
^ contour marking the .05 region of rejection. The equation 
for the Х contours, it will be recalled,! is as follows: 


logi А x [logo S? — (M? + S? — 1)](4343) (8) 
or 
k = .4343(М? + 8?) — logo 8 (5) 


$ * 2 X-X c 
in which k = .4343 — N logio M M = = and S = + 


! See p. 353. 
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If hypothetical values are assigned to X and 6 and if А (or Ё) 
is given the value that makes the probability of as great or 
greater ^ just equal to .05, the equation above will represent 
the boundary line of the .05 region of rejection to be used in 
testing the given hypothesis. If, however, a particular sample 
value is substituted for X and с and if А (and hence k) is given its 
-05 value, then this equation will show what hypothetical values 
of X and 6 would cause the given sample to fall exactly on the 
boundary line of the .05 region of rejection. When so used the 
equation becomes the formula for the boundary of the joint 
confidence zone for the mean and Standard deviation. 

Suppose, for example, that a random sample of 11 cases 
is found to have a mean of 95 and а standard deviation of 13. 
What is the joint confidence zone that may be said to cover 
the true values of the population mean and Standard: deviation 
with a probability of .95? For samples of 11, the value of k 
for which the probability of as great or greater А is just .05 is 
found from interpolation in Table 41 to be approximately .695. 
On substituting this value of k and the given sample value of X 
and c in Eq. (5), the boundary of the joint confidence zone for the 


mean and standard deviation of the population is given by the 
following, 


g (95 — X)* 169 169 
.695 = 4343, ES ыы rd = logio (9) 


which, on solving for X gives 


X = 95 + /-1694 2.30266*(2.9229 —Tog,, 9?) 


By substituting various values for à and obtaining the cor- 
responding values of X a graph may be drawn of the boundary 
line of the desired confidence zone. This has been done in 
Table 43 and depicted in а graph in Fig. 111. 

The result so obtained may be interpreted as follows: The set 
of X, 6 values included within the oval-shaped curve of Fig. 111 


Accordingly, any pair of values 
© 18 à reasonable hypothesis as to the 


DU 
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true values of the mean and standard deviation of the population 
—it would be a hypothesis that would not cause the given sample 
to fall in the .05 region of rejection of Fig. 106. 

Any pair of values outside the oval-shaped curve of Fig. 111 
would be an unreasonable hypothesis as to the true value of the 
mean and standard deviation of the population. The curve 
shows that the maximum tenable value for the mean of the 
Population is approximately 107 and the minimum tenable 
value is approximately 83. It also shows that the maximum 
tenable value for the standard deviation of the population is 


0 Y X A А - 
ов 85 90 95 100 № 10 5 — "Xx 
Fio. 111.—The .95 joint confidence zone for X and 6, given X = 95, с = 13, 


and N = 13 


Approximately 25.3 and the minimum tenable value is approxi- 
mately 8.8. 

It is to be noted, however, that these extreme values for 
the mean and standard deviation of the population are not 
Jointly tenable. If the mean should be assumed to have the 
value 107, the only tenable value for the standard deviation 
of the population would be 17.5. If the standard deviation 
Should be assumed to have the value 25.3, the only tenable value 
lor the mean of the population would be 95. Joint tenability, 
of Course, is the very essence of the diagram. It shows what 
Tange of values for one parameter is jointly tenable with a 
Sven value of the other variable. 

If the mean is assumed to be 85, for example, then the values 
9f the standard deviation that are jointly tenable with this value 
of the mean are the values from 13 to 22. Likewise, if the 
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standard deviation is assumed to have the value 20, say, the 
values of the mean that are then jointly tenable are the values 
from 84 to 106. 


TABLE 43.—ILLUSTRATING THE CALCULATIONS NECESSARY FOR THE 
GRAPHING OF a JOINT CONFIDENCE ZONE 
For the mean and standard deviation of the population 


(1) | (2) (3) (4) (5) (6) (7) (8) (9) 

| d | log вт |?9229.-| 2.30206: | (3)- (4) (6) — 109 vm 95 ± (8) 
9| 81 1.9085 1.0144 186.5106 189.20| 20.20 ± 4.5| 99.5| 90.5 
10 | 100 2.0000 .9229 230.2600 212.51| 43.51 + 6.6 101.6] 88.4 
11 | 121) 2.0828 .8401 278.6146| 234.06 65.06) + 8.1 103.1 86.9 
12 144 2.1584 .7645) 331.5744 253.49 84.49 + 9.2) 104.2) 85.8 
13 | 169 2.2279) .6950) 389.1394| 270.45 101.45) +10.1) 105.1 84.9 
14 | 196 2.2923] .6306, 451.3096 284.60) 115.60] --10.8| 105.8) 84.2 
15 | 225 2.3522 .5707| 518.0850| 295 67 126.67) +11.3 106.3) 83.7 
16 | 256 2.4082) .5147| 589.4656 303.40] 134.40] +11.6| 106.6] 83.4 
17 | 289 2.4609  .4620| 665.4514| 307 44 138.44 +11.8| 106.8) 83.2 
18 324 2.5105) .4194| 746.0424 307.67) 138.67| +11.8] 106.8 83.2 
19 | 361) 2.5575) .3654 813.2386 303.73) 134.73| +11.6| 106.6 83.4 
20 400 2.6021) .3208 921.0400 295.47| 126.47| +11.2| 106.2] 83.8 
21 | 441) 2.6444! .27851015.4466 282.80) 113.80] +10.7| 105.7 84.3 
22 484 2.6848) .23811114.4584 265.35) 96.35] + 9.8| 104.8| 85.2 
23 | 529 2.7235  .19941218.0754 242.88 73.88 + 8.6 103.6] 86.4 
24 | 576] 2.7604 .1625/1326 2976 215.52 46.52| + 6.8 101.8 88.2 
25 | 625) 2.7959) .127011439.1250 182.77 13.77 + 3.7| 98.7 91.3 
= a ү = “э 


The equation to which th 
is as follows: 


А 95 — 3)? 
:695 = 4343 EE + | — log 169 + log о? 


е calculations above give solutions 


or 


Х = 95 + vV —169 + 2.302662(2.9229 — log 8?) 

This equation is Eq. (5) with k = 
с = 13. 

Joint Maximum-likelihood Estimate 

ard Deviation. А final u 

and standard deviation 


:695, N = 11, X = 95, and 


S of the Mean and Stand- 
se of the joint distribution of the mean 
is to make joint maximum-likelihood 
d standard deviation of the population. 


The equation for this joint distribution is Eq. (1). This equa- 
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tion may be viewed from two aspects. For a given value for 
each of the population parameters X and 6, it may be viewed 
аз giving the probability of getting various sample values for 
Statistics X and о; for given sample values of X and c, it may be 
viewed as giving the probability of getting such a sample result 
for various hypothetical values of the parameters X and 6. 

In testing hypotheses and determining confidence limits the 
first view of the equation has been adopted and illustrated; 
in determining maximum-likelihood estimates it is the second 
aspect of the equation that is significant. For, by definition, 
the maximum-likelihood estimates of the population mean and 
Standard deviation are the values of these parameters that will 
make the probability of the given sample a maximum. Thus a 
Study is made of how the probability of the sample varies with 
different hypothetical values for X and 8, and the values that 
make this probability the greatest are the maximum-likelihood 
estimates. Algebraically the procedure is as follows: 

Since the probability of the given sample will be a maximum 
When its logarithm is à maximum, the first step is to simplify 
Eq. (1) by taking logarithms. This gives the following result 

(X —X)?__ No? 


log Р(Х с) = —log oe — te 9j — (№ — 1) log 6 


+ other terms not involving X or 6 (6) 
or, since 9t = UN, 


log Р(Х) = NGC XY + Ме — N loga 
+ other terms not involving X or 6 


The values of X and 6 that make this а maximum will be those 
for which the partial derivatives with respect to these param- 
eters are equal to zero. Taking these derivatives and setting 
them equal to zero give the following results: 
2N(X -X) _ 
26° Е 
М, М(Х – Х) + No? 

=. à + FE 


0 
(7) 
=0 


The common solutions of these two equations are X — X and 
$ = c. These, then, are the joint maximum-likelihood estimates 
9f the mean and standard deviation of the population. 


CHAPTER XV 
SAMPLING FLUCTUATIONS IN REGRESSION STATISTICS 


Lines and planes of regression are commonly used devices to 
estimate one variable from one or more other variables. It is 
therefore desirable to study sampling fluctuations in regression 
statisties, higher order variances, and lines and planes of regres- 
sion in order to determine the accuracy of the estimates made 
from them. Such a study is the purpose of this chapter. 


MAXIMUM-LIKELIHOOD ESTIMATES 

OF REGRESSION STATISTICS AND HIGHER-ORDER VARIANCES 

Two Variables. The problem to be considered here is this: 
If a random sample has been taken from a given bivariate popula- 
tion, what are the best estimates that may be made of the 
regression parameters of the population and how will these 
estimates fluctuate from sample to sample? In what follows 
it will be assumed that the population is a normal population 
in which the lines of regression are the loci of mean values and the 
distributions of cases around these lines are normal distributions. 

Consider first the line of regression of X, on X 2, and let the 
equation of this line in the population be represented by the 
equation 


Xi = 815 + bX, (1) 


The problem is to estimate ау» and bi; and to determine the 
sampling fluctuations of these estimates. This section will be 
devoted to the first of these problems. Although the analysis 
relates to only one of the lines of regression, it is equally applicable 
to the other. 

As in other cases the method of solving this problem will be 
the method of maximum likelihood. In other words, the 
estimates of a1. and bi; will be those that will make the logarithm 
of the probability of the given sample a maximum. The pro- 
cedure is as follows: First note that any pair of X „Хэ values 
may be represented by a point in а plane such as point P in 
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Fig. 112. If values are assigned to ai» and bis, the line of 
regression represented by Eq. (1) will be a straight line in this 
plane. The vertical deviation of the point P from the line of 
regression will be equal to 

v = X, — X, = X; — аш ¬ ВХ 


For each pair of X, X» values a vertical deviation from the line 
of regression can thus be caleulated. А sample set of Xi;,X» 
pairs of values can thus be translated into a sample set of devia- 
tions from the line of regression. If the values assigned to 
21» and by are such as to make the logarithm of the probability 
of the corresponding set of deviations a maximum, they will 
also be such as to make the logarithm of the probability of the set 
of sample X;,X; pairs of values а maximum, and vice versa. 


Fio, 112.—Graph of a vertical deviation of a point from the line of regression 
of Xi on X». 


Since the population is assumed to be a normal bivariate 
population, it follows that the deviations from the line of regres- 
Sion are normally distributed with a mean of zero and a standard 
deviation equal to the population first-order standard deviation 
9$.» The probability of each deviation will thus be of the form 


dP(v) — $ Vi exp |- e dv (2) 
1.2 T 


Since the sample is assumed to be a random one, the various 
deviations will be independent of each other and the probability 
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of the whole sample of deviations will be the product of their 
individual probabilities. Inasmuch as exponents are added in 
multiplication, the probability of the sample set of deviations 
will be given by the equation 


1 
dP (v1,02,03, * * * , uy) = (61.2 Мб 
exp | Bt chide ote _ ёла | dv; dv» dug ۰ + ° dux 
28 „ 


which may be written 


dP (ооз, * + * , Uy) 
ху? 


2 


1 
= —— exp y | dv dv: * ° ° duy (3) 
(a Ут)" М | a. dun : 
But each v is of the form X; — а» — bisX , so that the exponent 
of e can be written (Ху — ay. — bisX;)". According to the 
method of maximum likelihood, the values of ау» and рә are 
to be so chosen that the logarithm of Eq. (3) is à maximum. 


Since the logarithm of 


(NUTS is independent of a; and 
1.2 Ў 


bis, it may be ignored in determining their maximum-likelihood 
estimates. The logarithm of the second part of Eq. (3) is 
merely the exponent of e, since log, e” is x by definition. Hence 
the logarithm of Eq. (3) will be à maximum when 


—Z(Xi — аз — bis X 2)? 


is a maximum, or Х(Х, — ayo — bieX»)* is a minimum. 

Accordingly, the mathematical problem is so to choose ai 
and bi that Z(X; — a4, — bısX2) is а minimum. It will be 
recognized that this is the criterion of least Squares. That is, if 
a line of regression is fitted to a sample set of data so that the sum 
of the squares of the deviations from the line is a mini 
the values of the regression statistics 80 obtained are the maxi- 
mum-likelihood estimates of the population regression param- 
eters. But if Z(Xi— a,,— bizX3)* is to be a minimum, its 
partial derivatives with respect to ai» and bis must be zero. 
These two conditions give the equations: 


mum, then 


Z(Xi—ai;— bX) = 0 (4) 
У(Х, — аа = bi;X35)X, =0 
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И й, and by. represent the maximum-likelihood estimates of a:.» 
and bis, respectively, as given by Eq. (4), then 


die = Xi— бХ ANE 
p, _ ЭХХ, NX (8) 
2t X:— NS 


The sample least-squares values of ai» and bis are thus the 
maximum-likelihood estimates of the population parameters. 
Similarly, the sample least-squares values of азл and ba: are the 
maximum-likelihood estimates of a».1 and bai. 

More Than Two Variables. In like manner, it can be shown 
that the sample least-squares values of the regression coefficients 
of planes and hyperplanes of regression are the maximum-likeli- 
hood estimates of the corresponding population regression 
parameters. Thus the maximum-likelihood estimates of 41.2... , 
Dı2.3..., Ъз... are given by the solutions of the least-squares 
equations. 


SUE а binar Dist „Жы 3 =0 
SY аа”. Бана Dit aa Ae Fede (6) 


BK) а Ванн о Хе ie — ED» 


Similar equations hold for maximum-likelihood estimates of 
8213... , bora... , Dare...) ete. 


SAMPLING DISTRIBUTIONS OF REGRESSION STATISTICS 


. Two Variables. The Sampling Distributions of 41.2, bis, йзл, and 
ba. The sampling distributions of regression statistics calculated 
by the method of least squares (i.¢., the sampling distributions 
of the maximum-likelihood estimates of the regression param- 
eters) are similar to the sampling distributions of the mean. 
If the population is normal, these distributions ave normal. 
The means of the sampling distributions are the population 
values and the standard deviations are the population higher- 
order standard deviations 3r... divided by VN times some 
function of the independent variable. These general formulas 
will first be explained and illustrated with reference to two 
variables. 

For a line of regression the maximum-likelihood estimates of 


21» and bi; are given by Eq. (5). The sampling distributions 
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of these estimates (йә and Ё.) are normal with means equal 
to the population values of а;. and bi» and standard deviations 


equal to fii and бш 
n VN о NN 


distributions of й: and bs are normal with means equal to 


; respectively. Similarly, the sampling 


ds 62.1 62.1 
d bs; and standard deviations equal to Zk =) 
аз. ап 21 q О VN an xs JN 


respectively. 

If the first-order standard deviation of the population is not 
known, it must be estimated from the sample and the ¢ distribu- 
tion must be used in place of the normal curve, the appropriate 
value of n being N — 2. If the sample is large, the ¢ distribution 
is so close to the normal distribution that the latter may be 
used instead. Hence, it makes little difference in the case of 
large samples whether or not the population first higher-order 
variance is known. In general, all the discussion of Chap. XI 
concerning the sampling fluctuations in the mean are applicable 
to the sampling fluctuations in d,» Di», 1, and be. 

Use of Distributions in Testing Hypotheses. To illustrate the 
testing of a hypothesis regarding a regression coefficient of 2 
normal bivariate population, consider again the data on grades 
of Mount Holyoke students in first-semester English, X+, and 
second-semester English, Ху. It was seen in Chap. XIV of 
Smith and Duncan’s Elementary Statistics and Applications 
that the value of bis is .8322. Since grading in the two courses 
was apparently on a similar basis, it might be expected that a 
student whose grade exceeded the mean grade in first-semester 
English by 20 points, say, would tend to exceed the mean grade 
in second-semester English by an equal amount, so that the 
value of biz might be expected to be 1.00. Let the hypothesis, 


ет that bi» = 1 be tested in the light of the sample 
result. 


Since the population first-order 
known, it is necessary to estimate 
indicated below,! the maximum-likelihood estimate of the 
population first-order standard deviation is derived from the 
sample first-order standard deviation by multiplying it by 


VN 
=з 


Standard deviation is not 
it from the sample. As 


VN - 


1 See р. 388 


Since the sample first-order standard deviation is 
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19.53,* the maximum-likelihood estimate of the first-order 
standard deviation of the population is thus 


ёз = 19.53 V} = 19.18. 
The estimate of the standard error of b» is consequently 


d ik... 01978 
мм (47-29)(9) 
since тз = 47.29 and N = 81. 

The difference between the sample value, .8322, and the 
hypothetical value, 1.00, is 0.168, which is more than three 
times the estimated standard error. Since the sample is large, 
the normal curve may be used to test. the hypothesis. The 
lower .05 point on a normal curve comes at 1.6450. Hence the 
sample value obviously falls far below the .05 point, and there- 
fore the hypothesis must be rejected. The value of bis is almost 
certainly not 1.00. 

Use of Sampling Distribution in Determining Confidence Limits. 
Confidence limits can be obtained for biz as follows: Suppose 
first that the confidence coefficient is set at .95, in other words, 
that the confidence limits are to be so chosen that the chances 
of their including the population value are 95 out of 100. Fur- 
ther, let the confidence interval be such that the chance of failing 
to include the population value because the interval is set too 
high is just equal to the chance of failing to include the population 
value because the intervalissettoolow. In short, let the desired 
Confidence interval be an unbiased interval with a confidence 
Coefficient of .95. 

For the Mount Holyoke data the value of biz was .8322, and 
the maximum-likelihood estimate of the standard error of bis 
has just been seen to equal .04647. Since the sample was large 
(№ = 81) and the sampling distribution can be taken as normal, 
an unbiased confidence interval for biz with a confidence coeffi- 
cient of .95 will be given by bia + 1.0665, which for the given 
data yields .8322 + 1.96(.04647) = 0.9233 and 0.7411. The 
desired confidence interval is thus 0.7411-0.9233. И the sample 
had been small, the ¢ distribution would perforce have been 
used. For example, if N = 20, then the desired confidence 
limits would have been 0.8322 + 2.10(.04647), where 2.10 is the 


Е * See Ѕмттн, J. G., and А. S. Duncan, Elementary Statistics and Applica- 
tions, p. 363. 


= .04647, 
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05 value for ¢ with n = 20 — 2 = 18. Had there been but 
20 cases in the sample, therefore, the confidence limits would 
have been 0.7326 and 0.9318. 

More than Two Variables. The argument for two variables 
can be extended without difficulty to samples involving more than 
two variables. Thus, as already indicated, the sampling distri- 
bution of any maximum-likelihood estimate of a regression 
parameter of a normal population is normally distributed with a 
mean equal to the population by. and a standard deviation equal 
to биж...» divided by ~/N times some function of the independent 
variables. In the case of two variables the function of the 
independent variable that was used was its standard deviation. 


т 61.2 
Thus 685, equaled 12. For more than two variables the 
c2 VN 


standard-error formulas are equally simple, the form of their 
equations being symmetrical, as follows: 


б 
баз = — 523 
VN 
61.23 
фы = —— 
02.3 VN 
эйе = 61.23 
Ба 
03.2 VN 
61.234 
Sao = Ji 
ne 91.234 
heit — 
02.34 VN 
61.234 
dpa = فق‎ 
0353 VN 
3 
б 3 1.234 
м 04.23 N 
and, in general 
Фк 
[na 
N 
v (7) 
3; — Q9...» 
| t T 
А М 
c f КЖ 
The standard deviations c;., . . . „ can be calculated by the equation ' 
5 x; " 
Sli... = 001 — 1$)(1 — Ug) ee (L— 7 jkt ») 


1 Cf. Smith and Duncan, ор. cit., p. 436. 
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If the value of бу...» is not known, it must be estimated 
from the sample value cijr...»- When this is done, the £ 
distribution must be used in place of the normal distribution 
with n 2 N — m where m equals the number of regression 
statistics in the regression equation. If the sample is relatively 
large, however, say N — m > 30, the normal curve can be used 
with sufficient accuracy, even if бле... 18 estimated from the 
sample value. 

Testing a Hypothesis. In Chap. XVII of Elementary Statistics 
and Applications, it was found that for Mount Holyoke students 
bissa = .7211 where X1 represents à student/s grade in second- 
semester English, Хг her grade in first-semester English, Xs her 
verbal scholastic-aptitude test score, and X, her grade on the 
College Board Entrance Examination in English. The value of 
01.234 Was 18.63, and the maximum-likelihood estimate of 61.234 
is! found by the equation 


1 
dios = 18.63 Ja — 19.55. 


خلا 


The maximum-likelihood estimate of the standard error of 


дуз _. The value of 02.34 may be obtained 
f 02.34 VN 
rom the equation? 


gj = i VA - rh М1 — ria (8) 
In the present instance this gives 


02.34 = 02 м1 — Tis М1 =r 3 
= 47.29(.8046) (.9192) = 84.98 


D12.34 ds би = 


The values of 1 — 13, and 1 — 73,3 are derived from the data 
for rs; and газ given in Elementary Statistics and Applications. 


Hence the maximum-likelihood estimate of the standard error 
of die. 54 is 


Е ET 


‘Cf. p. 376. 
* Cf. p. 378. | 
SMITH and Duncan, op. cit., рр. 445, 456-458. То obtain values of 
1—7? for given values of r, the sine and cosine tables may be used; for 
Sinz — 4/] — cos? z. 
V1 — cos? х. 
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So much for the preliminary calculations. Suppose the 
hypothesis is set up that grades in second-semester English 
tend to show less variation than those in first-semester English 
owing to training received in first-semester English. Buppose, 
further, that it is commonly believed that after allowance is 
made for differences in verbal aptitudes and in secondary- 
school training as represented by the College Board Entrance 
Examination in English, a deviation of 10 points from the mean 
of first-semester English grades will on the average be accom- 
panied by a deviation of only 7$ points in second-semester 
English. This is a hypothesis that the value of bis, = .7500. 
The value of 5:55 calculated from the sample is .7211. How 
does the hypothesis fare in the light of this sample result? 

To test the given hypothesis it is necessary merely to compare 
the deviation of the value of bis. from the hypothetical value 
for bı2. with the standard deviation of Б... For the given 
data this yields the following result: 


-7211 — .7500. —.0289 т 
0621 = 062г = —0.464 


as normal. Let the Coefficient of risk be taken as .05 and let 
the region of rejection be taken as values of 2/8 < —1.645. 
The sample value of 2/6 = = 
region of rejection, and the hypothesis of bios; = .7500 is 
therefore not rejected.. If the sample had been small, then 
the t distribution, with n = N — 4, would have been used. 
Confidence Limits. Tf the confidence coefficient is set at .95, 
unbiased confidence limits for the biz: of the Mount Holyoke 


data are given by 61.34 + 1.968, Which for the given data 
yields .7211 + 1.96(.0621) = .843 and 


-T211 — 1.96(.0621) = .599. 


Hence the chances are .95 that the range from -599 to .843 covers 
the population value of Ъз. For a small sample the .05 


value of ¢ for n = N 4 would have had to be used in place 
of 1.96. 
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SAMPLING FLUCTUATIONS IN LINE OR PLANE OF REGRESSION 


Lines of Regression. For any given value of X; the sample 
regression value of X/ is a function of di.» and by. given by the 
sample regression equation X{ = dis + бл Хг. Hence sampling 
fluctuations? in й and б.о will cause sampling fluctuations in X1. 
Let X» be measured from its mean value so that the regression 
equation can be written Xj = die + bixo. In this case, 
di» = X,, but to avoid a shift in notation it will continue to be 
called ğı Under these conditions, sampling fluctuations in 
X1 for an arbitrarily selected value? of т» will be normally dis- 
tributed with a mean equal to the population value of X; for the 
selected value of zs, that is, X{ = ai» + bist, and a variance 
equal to the variance of dı.» plus the variance of бл» multiplied 
by 22. These relationships may be written succinctly as follows:* 


A = аә + bist? | 
3 9 
Oxy = V ба | 65:28 9) 


Where X! refers to the mean of sample values of X! for the 
Selected 2» and вх refers to the standard error. 

With the help of these formulas the sampling fluctuations in 
X; can be analyzed in the same manner as the sampling fluctua- 
tions in а mean or a regression coefficient were analyzed. For 
example, a particular hypothesis regarding a certain value of X4 
can be tested by taking the ratio of the difference between the 
hypothetical value and the sample value to the estimated stand- 


‘In this section and the rest of this chapter, X is taken as the dependent 
variable. The argument is valid for any dependent variable, however, and 
formulas in which Xs or X; are taken as the dependent variable may be 
derived by interchanging the subscript 2 and 1 or Запа 1. 

* The assumption underlying the argument of this section is that the 
values of X; are the same from sample to sample but the values of X; vary 
at random. The sampling is therefore of those values of X, associated with 
Elven values of Xo. 

* See footnote 2, page 380. 

^Tt will be noted that the “variables” in this case are di,» and Бәл» (the 
selected value of а» being a constant, a sort of “coefficient” for bis). Equa- 
tions (9) are thus merely applications of the theorem that the mean of the 
Sum of two normally distributed variables is the sum of their means and the 
variance of their sum is the sum of their variances if the variables are inde- 
Pendent (cf. pp. 419-421). It may be shown that di.» and bisz: are inde- 
pendent in their sampling fluctuations, 
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ard error of this difference. If the sample is small and the 
standard error has to be estimated from the data, this ratio is 
looked up in a ¢ table with n = N — 2, If the sample is large 
or if the standard error is based upon population values, then the 
normal curve may be used. 

Or, again, unbiased confidence limits for X: 1 can be determined 
by laying off +1 оох, from the sample X1, if the sample is small, 
or +1.9бох», if the sample is large. The symbol £,5 refers to 
the .05 point! in a £ table for the given value of n. The analysis 
is essentially the same as before and will not be repeated here. 

One aspect of the sampling fluctuations in Xi, however, 
deserves further consideration. Since the standard error of 
X; is a function of the values of the independent variables, con- 
fidence limits for X? will vary with its location on the line or 
plane of regression. Е urthermore, it is to be noted from Eqs. (9) 
that, the closer zs is to its mean, the closer the confidence inter- 
vals are to the sample values of Xi. The loci of the confidence 
limits for all points on the line of regression yield confidence 
limits for the line itself, If the confidence coefficient is .95, it 
may be said that the chances are .95 that these limiting loci 
include the population line of regression. 

If the sample is small and the standard errors have to be 


estimated from the data, the equations of the limiting loci with 
a .95 confidence coefficient are 


Х! = @ + буол» + bos Võtan + 9.13 (10) 
where £j; is the .05 point of a ¢ table for п = N — 2. If the 
sample is large, 1.96 can be used in place of tos, A graph of the 
limiting loci for the line of regression of X, on X; for the Mount 
Holyoke data is shown in Fig. 113, and the numerical values for 


the graph are given in Table 44. These were computed from tha 
equations 


Xi = 217.3 + 0.83227, + 1.96 v4.83 + 00215922 


which are the equations for the limiting loci. Неге 4.83 = ^a 
and is calculated from the equation 


1 This is the point on each tail that gives an individual tail area of .025 
and а combined tail area of .05 (see Table VII in the Appendix). 
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where 0%, is the maximum-likelihood estimate of the population 
first-order variance and is equal to the sample first-order variance 


multiplied by ў = y For the given data, 
ё, = (BS — 391.07 


and therefore é2;,, = 391.07/81 = 4.83. The quantity 0.002159 
is êj and is equal! to the square of .04647, which was found 


350 
Equation for maximum 
likelihood estimate of 
300 line of regression, aa" 
Х/=4758 +08322Х2 
Equations for unbiased 
0.95 confidence limits 
for line of regression 
250 X7=2/74 40.8322 X2 
x 219%14283+0002159х2 
1 
200 
150 
100 - 
о 


0 
15 X; 

Fra, 113.—Maximum likelihood estimate and confidence limits for line of 
Tegression. The chances are .95 that these limiting loci include the population 
ine of regression. 
above to be the standard error of by. The numerical values 
from which Fig. 113 was constructed are given in Table 44. 
It may be noted again that the limiting loci are closest to the 
Sample regression line at the mean of Xs (i.e., where хә = 0) 


' Actually, the value 0.002159 was ealeulated from the squares of the 
components and differs slightly from the square of 0.04647 itself. 
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and tend to swing away from this line as Xs departs further and 
further from its mean value. 


TABLE 44.—SoLvTIONs ror THE Equation SHOWING CONFIDENCE LIMITS 
FOR A LINE OF REGRESSION 
Equation of confidence limits: 


Xi = 217.4 + 0.83221, + 1.96 1/4.83 + 0.00215972 


а) | o @) ч) (5) @ | o0 (8) (9) 
rm | E d 

‘a2 | a | 0.83222: | 0.002159: (4) + 4.83 Vi | 1.96(6) HS e? | х 
—14019,600 —116.5| 42.32 | 4715 6.866 13.4 114.3 87.5 64 
—12014,400 —99.9) 31.09 | 35 92 15.998 11.7 |129.2105.8| 84 
—10010,000 —83.2 21.59 | 26 42 15.140) 10.1 144.3124.1| 104 
—80 6,400 —66.6 13.82 | 18 65 1.319 8.5 159.3142.3| 124 
760 3,600 —49.9 7.77 | 1260 3.550 7.0 174.5/160.5 144 
—40 1,600 —33.3 3.45 8.28 2.878 5.6 |189.7178.5| 164 
—20 400 —16.6 0.86 5.69 2.385) 4.7 205.5196.1| 184 
0 0 0 0 4.83 2.198 4.3 |221.7213.1| 204 

20 400 16.6 86 5.69 2.385 4.7 238.7229.3| 224 
40 1,000 33.3) 345 8.28 2.878| 5.6 |256.3945.1| 244 

60 3,600 49.9 7.77 12.60 3.550 7.0 (274.3 260.3| 264 

80| 6,400 66.6 13.32 | 18.65 4.319] 8.5 292.5 275.5 284 
10010,000 83.2 21.59 26.42 5.140 10.1 310.7 290.5 304 
12014,400 99.9 31.09 35.92 5.993 11.7 329.0305.6 324 
14019,600 116.5 49.39 47.15 6.866 13.4 347.3 320.5. 344 


204. Cf. Suira, J. G., and А. J. DUNCAN, Elementary Statis- 
The line of regression plotted in Fig. 113 is based upon the 


| line of regression can be extended easily to à 
plane of regression, Thus, if samples are selected so that the 
values of the independent variables X, X 3 . . . are the same 


for each set of samples! and only the values of X; are allowed 
1 Suppose, for example, that N = 3 and the first set of sample values is 


Dependent Variable Independent Variables 


x X. X, X, 
10 2 9 B 
= з т в 


Тһеп all other samples of 3 would have to be selected so that Xa, Xa, and 


SAMPLING FLUCTUATIONS IN REGRESSION STATISTICS 385 


to vary at random from sample to sample, then the mean value 
of X1 for an arbitrarily selected set of values of X», Kaj. as 
will be 

X! = аул... + Баз... Ж + база... Sat ¢ ° (11) 


where zs = Х, — Xo, zi = Xa — Xs, . . . and Xj refers to the 
mean of the sample values of X; for the selected values of Xe, 
Xy .... The standard error of these sample values of Xj 
would be 


dy, = Baar... F Ohna... B+ Piss... °°° (2) 


The distribution of X{ will be normal, and this may be used to 
test hypotheses and determine confidence limits if TNR 
known or if it is estimated and the sample is large. If the 
Sample is small, the ¿ table must be used with n -N-—m 
Where m is the number of regression statistics in the regression 
equation. 

The limiting loci for a plane of regression are given by the 
sample plane plus the .05 value of ¢ for n = N — m times the 


Standard error of X{. The equations are 


Xi = diss... + bios... 22 + bisz аа * * * 
Еа F Poe... ME (18) 


Where (o; is the .05 point of a ¢ table for n = М – т. For 
large samples (that is, N — т > 30), € 1.96 may be substituted 
for Los. : 


SAMPLING DISTRIBUTION OF HIGHER-ORDER VARIANCES 


_ The sampling distribution of a higher-order variance is essen- 
tially the same as that of a zero-order variance, the form of the 
distribution being that of the x? distribution. More precisely, 


ү 702, В 
it may be said that the statistic Moe will vary from sample 
1.23... 


to sample in the manner of the x? distribution whose n is equal 
to N — m, m being the number of regression statistics in the 
regression equation. Thus the only difference between the 
Sampling distribution of a zero-order variance and that of а 


respectively, in each set of 


X; had th ; d 
— 5; 3, 1, 6; and 4, 0, 7, 
values 2, 3, 5; 3, ariable X; would be 


Sample values, Only the values of the dependent v 
allowed to vary at random from one sample set to another. 
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higher-order variance is in the value of n in the x? equation. 
In the first case, те has a x? distribution with n = N — 1, 


2 
Notas... 


and in the second case 37 + has a x? distribution with 
1.23... 


n-N-—m. 
All the sampling analysis regarding zero-order variance may 
' be applied directly to higher-order variances. For example, 
if the sample variance оў | is (21.02)? = 441.84, and if there are 
10 cases in the sample, the hypothesis that the population 
variance 6; | is 500, say, could be tested by looking up the value 
of 


2 
2 


N : T : 
a Т in a х? table with n = N — 2. For the given figures, 


this would give the result 001.80 = 8.83. On the assump- 


tion that a coefficient of risk of .05 is adopted and that the 
region of rejection is taken all at the lower end of the distribution, 
it is found that for п = 10 — 2 — 8 the .95 point of the x? 
distribution is 2.733. Since the sample value is larger than 
this, the hypothesis would have to be accepted. 

If the sample is larger than 30, say, the normal curve may be 


teed i plade of the x? distribution. In this case, the value of 
0? — à 


EVAN would be determined and the result looked up in a 
normal table. For example, for the Mount Holyoke data the 


size of the sample is 81 and ол = 441.84. То test the hypothe- 
sis that the population first-order variance is 490, say, calculate 


03i — 9, de. А 2 
в. V2/N For the given data this has the value 
441.84 — 490 
Lm m —.63. 
490 VA 


Since this is numerically less tha 


В n — 1.645, the hypothesis must 
again be accepted. 


Confidence limits with a confidence coefficient of .96 can also 
be established in the same manner as before 
No? | 

set ae equal first to the upper and then to the lower .02 points 


For small samples, 


of the x? distribution for which n = N 


is will gi — т, and solve for 6°. 
This will give unbiased confidence li nd solve fo 


mits. For example, if 
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N = 10 and o1, = 441.84, the upper and lower limits are given! 
à 10(441.84) a " 10(441.84) Е 

by ё, = e = АТА д A 
y 615 2:032 2,175 and 61.2 18.168 243. Lim 
its with only an upper or lower bound can also be derived in а 
manner similar to that described for the zero-order case. The 


work will not be duplicated here. For large samples, unbiased 


?ق — 2 

confidence limits are given by а E1196: СНОВ 
s are given b) $e VEN 
gl, = 441.84 


and N — 81 as in the Mount Holyoke problem, these become 

441.84 — di. 
Sin Ver 

Я Since the sampling distribution of a higher-order variance 

is the same as that of a zero-order variance except that in its 

x? form n = N — m instead of N — 1, it follows that the maxi- 

mum-likelihood estimate of the corresponding population 


= 41.96, which gives бїз = 639 and 61, = 338. 


" N + 
higher-order variance is equal to ў times the sample 


variance instead of y The proof of this is the same as that 


given in Chap. XI (pages 290 to 294). In the present instance, 
71» for the sample equals 441.84. Непсе the maximum-likeli- 
hood estimate of the value of 63.2 is 1.2 = 81 (441.84) = 453.03. 


USE OF LINE OF REGRESSION AND 
HIGHER-ORDER STANDARD DEVIATION IN ESTIMATING 
THE DEPENDENT VARIABLE 

The principal use of a line or plane of regression and the 
Corresponding higher-order standard deviation is to estimate the 
dependent variable from the independent variables. For 
example, the sample data on Mount Holyoke grades show that a 
Student's grade in second-semester English (X;) is on the average 
related to her first-semester English grade (Хз) by the equation 
(line or regression of X, on Хз) 


X = 217.4 + .8322(X2 — X3) 
This equation may therefore be used to forecast a student's 


` Note that n = 10 — 2 = 8. 
* See p. 289. 
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second-semester grade from her first-semester grade. For 
example, if a student gets a first-semester grade of 180, the best 
estimate that can be made of her second-semester grade will be 


xX 


217.4 + .8322(180 — 204.1) 
= 197.4 


But how reliable is this estimate? It is in answering this 
question that use is made of the first-order standard deviation 
around the foregoing line of regression. This first-order standard 
deviation was found to be e. = 19.53. If the foregoing equa- 
tion represented the population line of regression and the fore- 
going standard deviation was the population first-order standard 
deviation, and if the data were normally distributed, then it 
could be said that the chances are 95 out of 100 that the actual . 
second-semester grade will fall within the limits 197.4 + 1.96015; 
that is, between 197.4 + (1.96)(19.53) = 235.7 and 


197.4 — (1.96) (19.53) = 159.3. 


The foregoing line of regression and 
those of the population, however, but were obtained from a 
previous sample of 81 grades. Hence additional allowance must 
be made for the sampling fluctuation in the line of regression 
and in the first-order standard deviation. The procedure may be 
outlined as follows: 

It should be noted that the difference betw 
of the second-semester English grade 
occurs will consist of the sum of two independent deviations. 
First, there is the deviation of the sample regression line from the 
population regression line. This is the error that arises from 
taking the regression of the 81 sample cases as the population 
regression. Second, there is the 


Standard deviation are not 


een the estimate 
and the actual value that 


tion regression value. This is 


h mean of a distribution as 
representative of any individual case. The latter will, of 


course, differ from the mean in accordance with the laws of 
sampling. There are thus two sampling errors involved in 
estimating a future grade, one relating to the past sample of 
81 grades, the other to the sampling 


total error is the sum of these two. 
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z = Й 
Xam 2 laiana = 0 t Жашына F (aua Хаа) (14) 


~* lactual 
Previous analysis indicated that sample regression values 
X tend to be normally distributed around the population X; 
as à mean with a standard deviation equal to 


dx = 


Also, on the assumption that the grades form a normal bivariate 
population, it can be said that the actual second-semester 
grades tend to be normally distributed about the population 
regression line with a standard deviation equal to the (population) 
first-order standard deviation. Since sample grades in the future 
are independent of sample grades in the past, with respect to 
deviations from the population regression line, the sampling 
variance of the first of the right-hand members of Eq. (14) is 
independent of the sampling variance of the second right-hand 
member, Hence the sampling variance of the left-hand member 
15 the sum of these two independent sampling variances.? 'Thus 


2 , = 8 6. 15 
Хашашы > testimated 95 Quid S (15) 
с 1 = Vee, + dis 


Xigctuat А Hestimated 
_ If the population first-order variance 61 is not known and 
must be estimated from the sample value, then the ratio of 
P oa =Ni to its estimated standard error will have a 
sampling distribution that is of the form of the ¢ distribution, 
With n = N — 2. Hence to determine “confidence limits"! for 


the aetual value of the second-semester English grade, set 


Bo 
п UCET EET. 

А = & ET. m ET. 
Хула = X testimatea Ж Мв ма: + 9528 + Fis 


Where tos is the .05 value of a table for n = N — 2. For large 
Samples, Zos can be replaced by 1.96, and the normal curve can 
be used. 

As already noted, čas = 4.83 and 2%. = .002159. Further- 
more, х2 = (180 — 204.1)? = 580.81. Thus the confidence limits 


__ ‘Confidence limits in the sense that in repeated sampling these limits will 
include the actual value 95 per cent of the time. They are not confidence 
limits in the usual use of the words since they do not refer to a population 
Parameter, 


* See pp. 419-421. 
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for the actual value of the second-semester English grade are 


Xue = 197.4 + 1.96 4/183 + (002159) (580.81) F 391.07 
197.4 + 1.96(19.03) = 236.5 


and 
Хи. = 197.4 — 1.96(19.93) = 158.3 


The chances are 95 out of 100 that the range 158.3-236.5 will 
include the actual second-semester grade for the student whose 
grade is being predicted. 

Since N is large, these limits for the actual value of X 1 do not 
differ materially from those previously obtained on the assump- 
tion that the sample regression and first-order variance are the 
population regression and first-order 
samples, however, the allowance 
regression line and the first- 
greater difference, 

Similar analyses will determine confidence limits for forecasts 
from a plane of regression. For small samples, the limits are 
given by 


x= 


variance." For small 
€ for sampling errors in the 
order variance will make a much 


Tita Vs EPI E E дз... (16) 

where 1.05 is the .05 point of 

samples are large, then 1.96 
1 See p. 388. 


a table for n = N т. If the 
may be used in place of tos. 


CHAPTER XVI 
PROBLEMS INVOLVING TWO SAMPLES 


. Some of the more interesting problems in statistical analysis 
involve two samples. A sample poll, for example, is taken 1 
month before election and another 2 days before election. The 
former shows a Democratic vote of 54 per cent and a Republican 
vote of 46 per cent; the latter a Democratic vote of 49 per cent 

' and a Republican vote of 51 per cent. If the samples both 
number 100, can the difference in the two percentages be taken 
to represent a real change in sentiment or can such a difference 
be reasonably attributed to the chance fluctuations of sampling? 
Again, 200 automobile tires of make A show an average mileage 

` of 16,400 miles; 300 tires of make B, subjected to the same test, 
show an average mileage of 15,900 miles. Does the difference in 
mileage indicate that make A is really better than make B, or 

` can the difference be reasonably attributed to chance? It is 
With such problems as these that the present chapter will be 
Concerned. 


OUTLINE OF THE GENERAL ARGUMENT 

Proceeds from a Null Hypothesis. The statistical analysis by 
Which it is determined whether the difference between two 
samples can reasonably be attributed to chance starts with a null 
hypothesis. "The hypothesis is first set up that the populations 
from which the two samples are taken do not differ with respect 
© the characteristic in question. This is called the “null 
hypothesis,” Second, some statistic is computed from the two 
Samples that is based upon the difference in characteristics being 
Studied. This may be the difference between their mean values, 
the ratio of their two variances, or the like. Third, the sampling 
distribution of this statistic is derived on the basis of the null 
hypothesis, That is, the set of all possible pairs of samples from 
the assumed populations is derived, and the percentages of these 
Pairs of samples having various values of the selected statistic 

391 
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аге determined. Fourth, a special subset of the set of all possible 
pairs of samples is marked as an appropriate region of rejection. 
For example, this might be the upper 5 per cent of a normal 
curve. Finally, it is noted whether the statistic for the given 
pair of samples falls in the chosen region of rejection. If it does, 
the null hypothesis is rejected and the difference between the two 
samples is not attributed to chance. This general procedure will 
be explained more fully below with reference to several selected 
problems. 

Sampling Distribution of the Difference between Two I ndependent 
Sample Statistics. Before turning to particular problems, how- 
ever, two general relationships are worth noting. Often the 
statistic that is taken to measure the difference between two 
samples is the arithmetic difference between two individual : 
sample statistics. Thus the statistic may be X, — X. 2, 01 — 02; 
or the like. 

In general, let such a statistic be designated as 0, and let the 
two individual sample statistics be designated as 0, and 02, so 
that 0 = 0, — b. Now it is shown in the Appendix to this 
chapter that if the two samples are independent of each other; 
whatever the nature of the populations from which the two 
samples have been drawn, the mean of the sampling distribution 
of 0 is equal to the mean of the sampling distribution of 0;, minus 
the mean of the sampling distribution of 0», and the variance of 
the sampling distribution of 0 is equal to the variance of the 
sampling distribution of 0; plus the variance of the sampling dis- 
tribution of 0,. In summary, if the two samples are independent, 


0= 0, – 0, 
8% = 8%, + 8°, | Fd 
These general relationships are worth remembering. 


er sample. Consider, for 
cky in which 
emocratic nomination were Barkley 
10 а sample poll was taken which 
f the total vote and Chandler 33 per 


and Chandler. On Apr. 
gave Barkley 67 per cent o 
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cent. On July 8 another sample poll was taken which gave 
Barkley 64 per cent of the vote and Chandler 36 per cent.* 

If the first sample numbered 100 votes and the second 200 
votes, was it to be inferred that, during the 3 months from April 
to July, public sentiment shifted from Barkley to Chandler, or 
could the difference between the two returns have been reasonably 
attributed to chance? This is the question that the following 
argument will seek to answer. 

The Argument. The Null Hypothesis. In accordance with 
the general procedure outlined above, the first step in the analysis 
is to set up a null hypothesis. In the present instance this would 
state that the sentiment of the voters as à whole was the same 
on July 8 as it was on Apr. 10 and that the difference in sample 
returns for these two dates was merely the chance result of 
random sampling. ‘The purpose of the following analysis is to 
See whether this hypothesis should be accepted or rejected. 

The Statistic. The second step in the analysis is to choose an 
appropriate statistic for measuring the difference between the 
two sample returns. The statistic usually selected is the differ- 
ence between the two sample percentages. Call this statistic 
Pa, and let it be defined by pa = Pi — pi, where р! is the sample 
percentage in favor of Barkley on Apr. 10 and pi! is the sample 
Percentage in favor of Barkley on July 8. 

The Sampling Distribution of pa. It was shown in Chap. IX 
that the percentage of favorable cases in a sample would vary 
from sample to sample in accordance with the binomial distribu- 
tion. The mean of the distribution was found to be equal to 
Np: and the variance pip2/N, where p: is the percentage of 
favorable cases in the population and pe the percentage of 
Unfavorable cases. It was also pointed out that, if the sample 
Was large, the binomial distribution could be approximated by a 
normal curve whose mean and variance were the same as those 
of the binomial distribution. 

. The same sort of result can be demonstrated in the present 
instance? Thus it can be shown that the difference between the 
Percentages of two large samples independently derived from the 
Same population (the null hypothesis) is distributed approxi- 


5 GALLUP, G., and 8. F. Ras, “Is There а Bandwagon Vote?” The Public 
Pinion Quarterly, Vol. 4, (1940), p. 245. 


* The mathematical analysis is beyond the scope of this book. 
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mately in the form of a normal frequency distribution with a 
mean of zero and a variance equal to the sum of the variances of 
the two individual sample percentages. That is, pa can be taken 
to be normally distributed with a mean of zero and a variance 
equal to BE + BE. Where p; and р» are the percentages of 
favorable and unfavorable cases in the population from which the 
two samples are assumed to have been taken and N' and N" the 
number in each of the samples. 

In the present instance, p; and P» are not known and must 
therefore be estimated from the samples. This can be done by 
taking ру as the weighted average of p; and p and by taking p» 
as 1 minus the estimated value of Pi. Thus р, and р» can be 
estimated from the equations 

m ] HM 
Dı = УР РЁ and $3 


E 
| 

= 
| 
= 


б (2) 
For the data given, this gives 
100(.67) + 200(.64) E 


p= ^ 1004-300 — = :65 ашпа б 


Непсе, for the problem in hand, ра can be taken as normally dis- 
tributed with a mean of zero and a variance equal to 


+з. 


ъ= l- }, = .35 


85, = (65) C35) (xd + zt) = .003363 


The Region of Rejection. Let it be assumed that in instances 
of this kind the polling age 
hypothesis when it is true more than 5 times out of 100. "This 
means that the region of rejecti 
probability of a sample f 
to .05. 

Since a real shift in publie sentiment Away from Barkley and 
toward Chandler might ultimately lead to the election of the 
latter, whereas a shift toward Barkley would merely strengthen 
his existing lead, it may be presumed that, the polling agency 
would be more concerned about accepting the null hypothesis 
when the former shift in public sentiment had occurred than 
when the latter had taken place. 


On the assumption of such an 
attitude, it would appear that the upper .05 of the sampling 


distribution of ра would be the best region of rejection to adopt. 
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For the probability of а sample ра falling in this region would 
then be greatest if publie sentiment had actually shifted toward 
Chandler. The region of rejection in the present instance will 
therefore be the upper .05 tail of a normal eurve whose mean is 
zero and whose variance is .003363. 

Testing the Null Hypothesis. The final step is to test the given 
hypothesis by noting whether the given sample falls in the 
selected region of rejection. Since this région comprises the 
upper .05 tail of the normal curve, it will include all values of 
2/8 equal to or greater than 1.645. Та the present instance, 


Pa = 67 per cent — 64 per cent = 3 per cent, and 
в», = \/ 003363 = 5.8 per cent 


Hence, 
Ра _ 3 percent _ 517 
в» 5.8 per cent 


The sample obviously does not fall in the region of rejection. 
The null hypothesis is thus accepted, and the difference in the 
sample polls is attributed to chance. 

In conclusion, it should be noted once again that the foregoing 
analysis relates only to independent samples. If the same people 
Were questioned on July 8 as those questioned in April, the sample 
results would not be independent of each other and the above 
analysis could not be applied. 

Alternative Argument. Instead of using the foregoing argu- 
ment it would have been possible to solve the given problem by 
а test of independence such as that described in Chap. XIII. 
The steps in this alternative argument are as follows: 

First the results of the two polls are set up in a contingency 
table in which the votes are classified according to candidates 
on the one hand and dates of polls on the other. This has been 
done in Table 45. In this form the problem can be stated as 


TABLE 45.—Cross CLASSIFICATION оғ Port Data 


ү vori 7 vori Total 

DON Votes favoring Voti ler ^| votes 
оог Ê E EE EE dor as 67 33 100 
ан эзәр SESE E apa a E BE SE HE 128 72 200 
195 105 300 
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follows: Is the classification according to candidates doo Rege: 
of the classification according to dates, or does the division o 
votes vary significantly from April to July? те 

The null hypothesis assumes that the true division of m 
between the two candidates is the same in April and July an 2 
that the apparent differences are due to the chance effects ү 
sampling. On the basis of this hypothesis, estimates are made 
of the true division of sentiment by taking a weighted average 
of the sample results for the 2 months. 'Thus the percentage in 
favor of Barkley may be estimated as equal to 


100(.67) + 200(.64) _ 195 _ 
300 ~ 800 


.65 


and the percentage in favor of Chandler may be estimated аз 
equal to 100088) 4200630) = 205 = .35. The total vote in 
each month is then distributed in the same proportion as these 
estimates of the hypothetical division of sentiment. The He 
are shown in Table 46. The question then is: Do the ine 
results, shown in Table 45, differ significantly from the expecte 

results shown in Table 46? | 


TABLE 46.—DPorr Resutts Tuar Мор Br ExPECTED ON THE 


ASSUMPTION OF INDEPENDENCE 


Date of poll 


As pointed out in Chap. 


XIII, questions of this kind can be 
answered by calculation of the statistic 


Y (actual number in each cel] minus expected number)? (3) 
expected number 

and making use of the fact 

tribution of the form of a 

dom, it is noted, will be e 

number of rows in the co 


that this statistic has a sampling dis- 
x? distribution. The degrees of free- 
qual to (r — 1)(c — 1), where r is the 
ntingency table and c the number of 
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columns. Since г and c both equal 2 in the present problem it 
follows that statistic (3) for this problem will be distributed 
like x? with n = (1)(1) = 1. 

It remains now to determine what part of the sampling distri- 
bution will serve as the best region of rejection. In the previous 
argument the upper .05 tail of the normal curve (the sampling 
distribution there used) was considered as the best region to 
employ. For it was assumed that the polling agency would be 
especially anxious not to accept the null hypothesis if the actual 
trend in public sentiment was away from Barkley and toward 
Chandler, The same assumption will be made here. 

In view of this assumption it might appear at first glance that 
the upper .05 tail of the sampling distribution would again be 
the most appropriate region of rejection to employ, but this is 
not so. Statistic (3) does not take account of signs, and large 
values might arise from sample differences in favor of Barkley 
just as readily as from sample differences in favor of Chandler. 
If the region of rejection in this case is to be comparable with the 
upper .05 tail of the normal curve used in the previous argument, 
it should include only those values of the statistic that arise from 
differences in favor of Chandler. Such differences, on the basis 
of the null hypothesis, will occur only 50 per cent of the time. 
Hence, a .05 region of rejection comparable to that of the previous 
argument may be taken to constitute all values of statistic (3) 
that arise from differences in favor of Chandler and that at the 
same time are equal to or greater than the upper .10 point of the 
X distribution (n = 1), that is, equal to or greater than 2.706. 

The problem can now be solved. For the given data the value 
of statistic (3) is as follows: 


(67 — 65)? (33 — 35)? , (128 — 130)? , (72 — 70)? 
o ОЛ р Уй 


.264 


Since this is less than 2.700, the sample does not fall in the 
Selected region of rejection and the null hypothesis is again 
accepted.! As before, the difference between the two samples is 
attributed to the chance effects of sampling. ` 


1 Tt is interesting to compare probabilities in this problem. Ву the first 
method z/à = .517, and the probability of as great or greater value in either 
direction (+ or —) is approximately .60. Ву the second method the value 
of x? is .264, and the probability of as great or greater value is again roughly 
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DIFFERENCE BETWEEN TWO INDEPENDENT SAMPLE MEANS 


When the Populations Have a Common Known Variance. 
The Problem. The foregoing section was concerned with the 
difference between two samples from a discrete population. 
Similar problems arise in the case of continuous data. ] А problem 
that often occurs is whether the means of two independent 
samples are significantly different. This problem is attacked in 
different ways, depending on the conditions involved. In this 
section it will be assumed that the variances of the populations 
from which the two samples have been drawn are the same and 
that this common value is known. The question to be tested 
will be: Are the means of the populations also the same? Only 
normal populations will be considered here; nonnormal cases 
will be discussed in Chap. XVIII. 

Some years ago the New York Sun published employment 
data for a large number of industrial companies. Data were 
given for each company for the years 1929 and 1935. A random 
sample of 10* of the smaller companies was taken from both these 
2 years; the mean of the first was found to be 89.1 men, and the 
mean of the second 123.5 men. А different set of companies was 
taken for each year so as to make the samples independent of 
each other. If the standard deviations of the populations are 
known in this instance to be both 100 and if the populations are 
taken to be normal, can it be inferred from these two samples 
that industrial employment among small companies was 1D 
general larger in 1935 than in 1929? This is the problem that 
the following argument will seek to solve. 

The Null Hypothesis. To determine whether the means of 
the two samples are significantly different, it will first be assumed 
that they are not different, and then the consequences of this 
assumption will be compared with the actual results. The null 
hypothesis will thus be that the two samples are from identical 
normal populations with standard deviations of 100. 


60. In fact 4/3 is, for n = 1, distributed like double the upper half of a 
normal curve, and in the given problem 4/264 equals approximately .514, 
which is the same as .517, except for errors arising from the use of decimals. 

* Usually a larger sample than this would be taken. A small sample 18 


taken here to show that the analysis is applicable to both large and smal 
samples. 
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The Statistic. The statistic that is generally selected in 
problems of this sort is the difference between the sample means. 
If the mean employment of the 10 companies in 1929 is indicated 
as X; and the mean employment of the 10 companies in 1935 is 
indicated as X», then the selected statistic is defined as 


Xin = رک‎ — Xe 


The Sampling Distribution of Ху». It can be shown by mathe- 
matical analysis! that, when the population from which the two 
samples are independently drawn are normal populations, then the 
sampling distribution of X;.» is also normal, the mean of the 
distribution being zero and its variance being equal to the identi- 
cal variance of the two populations multiplied by x + z 
That is, if all possible pairs of samples of 10 each were selected 
from the assumed populations and the differences between the 
means of these samples were computed, it would be found that 
the differences would form a normal frequency distribution with 


р 1 T Ф р 
à mean of zero and a variance equal to ê (s. T м. /` Since in 
1 2 


the given problem ê is 100 and №, = №, = 10, it follows that 
Ais has a sampling distribution whose mean is zero and whose 
Variance is (100)°(х% + 15) = 2,000. 

The Region of Rejection. Let the risk of rejecting the null 
hypothesis when it is true be placed at .05. This means that the 
total region of rejection should be of this size. Furthermore, 
there appears to be no special reason in this instance to put all 
the region of rejection at one end of the sampling distribution. 
“Or, in the absence of any special motive in making the statistical 
Investigation, it would appear to be as bad an error to accept 
the null hypothesis when in fact employment was greater in 1929 
than in 1935 as it would be to accept the null hypothesis when in 
fact employment was greater in 1935 than in 1929. On this 
basis the total region of rejection will be split up equally so as to 
Include the .025 tails at each end of the sampling distribution. 

he points marking these two tails, № will be recalled, are 
5/80 = +1.96. 


ч ‘The analysis is essentially the same as that showing that the mean of a 
ample itself is normally distributed (cf. Chap. X). 
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The Test of the Null Hypothesis. The difference between 
the sample means is 89.1 = 123.5 = —34.4. The variance гше 
sampling distribution of X;.» was computed to be 2,000. The 
standard deviation is the square root of this, or 44.7. The ratio 
of the difference between the two means to its standard deviation 
is thus —34.4/44.7 — —.77, which is well within the 1.96 point 
marking the region of rejection. Accordingly, there is no basis 
for rejecting the null hypothesis, and the difference between the 
two sample means can presumably be attributed to chance. 

When the Populations Have a Common but Unknown Vari- 
ance. In some problems it may be assumed that the variances 
of the populations from which the two samples have been taken 
are identical, but the value of this common variance may not be 
known. In this instance the population variance must be 
estimated from the samples. For small samples, this requires 
some modification in the foregoing analysis. 

When two samples are taken from normal populations with 
identical variances, the maximum-likelihood estimate of this 
common variance based on the sampling variation in the variance 
that is independent of the difference between the sample means 
is! 

2 Noi + Nw (4) 

Nic-N.—2 

where oj and о? are the two sample variances and №, and № are 
the number in the samples. Except for the —2, this is merely à 
weighted mean of the two sample variances. 
division by Ni + Ns — 2 instead of №, + Ns lies in the fact that 
each of the sample variances is measured from its own mean 
instead of the true population mean. Since the sample mean 
itself varies from sample to sample, this reduces somewhat the 
average value of the sample variances as compared with the true 
population variance. To correct for this bias in both the sample 
variances, the factor №, + №, — 2 is substituted for N, + №. 
The value of 6? is said in this instance to be an estimate of 9 
based on №, + №, — 2 degrees of freedom. 

! The probability of getting the two sa 
parts, one depending only on the two s; 
two sample variances. When the com 


so as to maximize this second part of th 
to be that given in the text, 


The reason for the 


mples can be broken up into two 
‘ample means, the other only on the 
amon population variance is chosen 
€ total probability, its value is found 
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Let the foregoing estimate of the population variance be multi- 

: 1 1 

lied by — + — 

а ТЕ * N: 

the two sample means to the square root of this quantity. The 
resulting ratio, which may be written 


, and take the ratio of the difference between 


Ae (5) 


E 1 


is similar to the Z that was found in the previous problem to be 


distributed in accordance with the standard normal curve. The 
only difference is that the estimated rather than the actual 
Population variance is now used. This difference, however, 

X X 
à N/Q/Ni 
distribution that is nonnormal for small samples. Actually, 
the sampling distribution of this statistic is of the form of the 

t distribution, with n in the ¢ formula, i.e., the degrees of freedom, 
equal to №, + М, — 2. 

_ Since the ¢ distribution approaches the normal curve when п 
is large (say greater than 30), the use of the estimated value of 
the population variance leads to no change in the analysis if the 
Samples are large.! When the samples are small, however, t.e., 
When №, + №, — 2 is less than 30, it is better to use the ¢ distri- 
bution in place of the normal distribution for testing the given 
hypothesis, This is the only fundamental change required in 
the previous analysis. 

Р To illustrate the estimation of the common population varia- 
tion, consider once again the data on industrial employment in 
1929 and 1935. Let it be assumed that the variance in employ- 
Ment of small industrial companies is the same in both years but 


causes the statistic to have a sampling 


1 'm . H 1. < n 
The ¢ distribution is leptokurtic, and its variance equals aem Hence 


a better approximation can be obtained for values of n between 30 and 100 by 


PONE ج‎ 1 1 n —2 r 
multiplying the statistic (X, — X2)/4 VL Ts by V7 before looking 


Up the value in the normal table. It will be noted that here 


п = № + М» - 2. 
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that its value is not known and must therefore be estimated from 
the samples. Suppose that the variance of the 1929 sample is 
9,834.5 and the variance of the 1935 sample is 18,832.1. As 
stated above, the maximum-likelihood estimate of the common 
variance is found as follows, 


2 _ 10(9,834.5) + 10(18,832.1) _ : 
10r 10—3 ^ = 15,926. 


ó 


The square root of this is 126.2, and the statistic 


Zar XS 
e ML 1 
Ht s 


89.1 — 1235 _ | 
126.2 уту +r ` 


Forn = 10 + 10 — 2 = 18, the .025 points of the / distribution 
are +2.101. Values of t numerically greater than 2.101 will thus 
constitute a symmetrical .05 region of rejection. It is clear that 
the sample value of ¢ does not in this instance fall in the region of 
rejection, and again the null hypothesis is not rejected. As 
before, the difference between the two sample means is appar- 
ently due to chance. 


If the samples had numbered 50 cases each instead of 10, the 
value of 


has the value 


would have been 
.89.1 — 123.5 = 1.46 
118 Ve +g 7 


Since the samples are large, the normal curve can in this case 
be used instead of the ¢ curve, even though the population vari- 
ance has been estimated. The region of rejection will therefore 
constitute values of 2/6 numerically greater than 1.96. The 
sample value of x/¢ is 1.46, and this again fails to fall in the region 


of rejection. "The null hypothesis continues to be accepted even 
though the samples are now larger, 
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When the Populations Do Not Have the Same Variance. If it 
cannot be assumed that the populations from which the two 
Samples have been drawn have identical variances, only a rough 
test can be employed to determine whether the populations might 
have the same means. If the samples are fairly large (say 100 
ог more), it can be assumed with a fair degree of accuracy that 
the sampling distribution of the difference between two sample 
means is a normal distribution with a mean of zero and a variance 
that is approximately equal to the variance of sample 1 divided 
by М, plus the variance of sample 2 divided by N». 

That is, for large samples, o?z z, can be taken as equal to 


2i + оў ; 
X м, and the ratio 


(6) 


can be treated as if it were normally distributed. Except for the 
different method of estimating ¢;,-;,, the procedure is essentially 
the same as that outlined in the foregoing sections, and illustra- 
tions will be omitted. 


CORRELATED SAMPLES 


The Problem. If in the foregoing problem the same companies 
had been taken in 1935 as were taken in 1929, the two samples 
Would not have been independent and the foregoing analysis 
Could not have been validly applied. Data often occur in this 
orm. А group of students, for example, may be tutored for one 
examination and not tutored for another. Again, a set of hogs 
may be fed one diet for one month and the same 566 of hogs fed 
another diet for another month. How in these cases is a statis- 
tical test to be made of the effect of tutoring on students’ grades 
9r the effect, of diet on the rate of growth of hogs? It is these 
Questions that the following analysis seeks to answer. 

Testing Individual Differences. When two samples relate to 
t e same set of individuals, the simplest method of analysis is to 
ake the difference between the two results for each individual. 
1 each Sample numbers № cases, this process will give a set of N 
dividual differences. If the two sets of data are not really 
erent, the whole population of individual differences will have a 

ean of zero, Sample sets of differences will, of course, have 
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means that are not zero, but these sample means will tend to be 
distributed around this population mean of zero. 

If 6? is the variance of the whole population of differences, the 
distribution of sample mean differences will have а variance of 
F This distribution will be normal in form, so that, if the 
variance of the population of differences is known, the mean of. 
any particular sample of differences can be tested by taking the 


б 


ratio of this mean value to VR and looking up the result in a 


normal frequency table. 

If the variance of the population of differences is not known, it 
must be estimated from the given sample. The и 
hood estimate of the population variance that can be malle 
independently of the sample mean is ê = N N j в”, where 
a°? is the variance of the sample of differences. The ratio of the 


sample mean difference to v gives a statistic the sampling 
1 


distribution of which is of the form of the ( distribution with n, the 
degrees of freedom, equal to № — 1. When the variance of the 
population of differences is not known, therefore, but must be 
estimated from the sample, the ¢ distribution is used to test the 
significance of the sample mean difference. Of course, if № is 
large, the normal curve can be used:as an approximation to the 
1 curve. Illustrations of these procedures follow. 

Illustrations. Employment figures for 10 small industrial 
companies in 1929 and 1935 are given in Table 47. This also 
shows the individual differences for the 2 years. "The mean of 
these differences is 10.5, and the question is: Does this sample 
mean difference differ significantly from zero, or can its positive 
value be reasonably attributed to chance? 

First suppose that the standard deviation of the whole popula- 
tion of differences is known to be 50. 
tion of the distribution of sample means of differences is equal to 
50/4/10 = 15.8, and the ratio of the given sample mean to this 
standard error is 10.5/15.8 = .66. This ratio, as pointed out 
above, is distributed in accordance with the standard normal 
curve. If the region of rejection is taken as the values of 2/9 
that are numerically greater than 1.96 (a symmetrical region 


Then the standard devia- 
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appears to be justified here), then this sample x/é does not fall 
in the region of rejection and the null hypothesis is not rejected. 
That is, the sample mean difference is not significantly different 
from zero, and the apparent difference in employment may 
reasonably be attributed to the chance effects of sampling. 


TABLE 47.—ĪNDIVIDUAL DIFFERENCE IN EMPLOYMENT OF 10 INDUSTRIAL 
СомрАҲІЕЅ, 1929 AND 1935 


Number employed | . 
Різ i mE Differences squared 
1929 1935 
110 48 62 3,844 
166 93 73 5.820 
130 120 10 100 
95 85 10 1 
36 52 —16 206 
188 240 —52 2 TUE 
85 135 —50 2.500 
185 130 55 3,025 
201 217 —16 206 
144 115 29 Sel 
> +105 > 18,955 


If the population standard deviation is not known, as is gen- 
erally the case, then it must be estimated from the sample. In 
the present instance, the standard deviation of the sample 
differences is equal to 42.25.* The best estimate, therefore, that 
can be made of the standard deviation of population differences is 
č _ VP 42.25) _ 1498, and 
VN v10 


VE (42.25); and this gives 


X, 10.5 ; | 
Por ee is last quantity, as noted above, is 
VN ^ 14.08 .75. This last q › $ 
distributed like ¢, Since for n = 9 the .025 points of the ¢ dis- 


* This is calculated by use of the short formula 


OOXX*o gs 
«хк 

which gives 
ss 18952 — (10.5)? = 1,785.25 


апа 
с = 42.25 
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tribution are 32.262, values of ¢ numerically greater than 2.262 
may be taken as a symmetrical region of rejection with the coeffi- 
cient of risk equalto.05. The sample ¿ = .75 and obviously does 
not fall in this region of rejection. The null hypothesis that the 
difference between the two samples is due to chance cannot there- 
fore be rejected, and the assumption that there is no real differ- 
ence in employment in the 2 years continues to be a reasonable 
one. 


If there had been 50 companies instead of 10, the ratio T 
could have been treated as if it were normally distributed. For 
example, if the mean and standard deviation of a sample of 50 
equaled 10.5 and 42.25, then č would have the value /F (42.25), 
and 


x BO үд 
4 УВ (42.25) _ оз. The ratio 
/50 


Ха 
VN 


é/ 
10.5 ая f 
would then equal 503 = 1.74. This is less than 1.96, so that, i 


the two .025 tails of the normal curve are taken as the approxi- 
mate region of rejection, the null hypothesis would be accepted. 
It would be concluded once again that employment in 1929, 
among small firms, was not really greater than employment 
among small firms in 1935. 
It is interesting to note at this point that, when X, and X» are 
correlated, the variance of the differences X, — X, is equal to the 
variance of X, plus the variance of X» minus twice the product 
of the standard deviations by the correlation between X; and X». 
Symbolically, 
Thix: = o, + вх. — 20 x6 xlix, (7) 

Hence it follows that, in proble 


ms of correlated samples, the 
variance of the differences is le 


ss if the correlation between X1 
and Х, 15 greater. The practical importance of this conclusion 
is that differences in central tendencies can be more readily 
detected if the correlation between individual memhers of the 


samples is increased. 
DIFFERENCE BETWEEN TWO SAMPLE VARIANCES 


Another problem that often arises in statistical analysis is $0 
determine whether two samples have come from populations with 
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different variances. То refer to a previous example, it might be 
claimed that a new process for the manufacture of electric-light 
bulbs will reduce the variability in length of life of the bulbs. 
То check this claim two lots of bulbs could be produced, one by 
the old process, the other by the new, and the difference in 
variability could be subjected to a statistical test. Again, two 
different brands of razor blades might be tested as to their 
Sharpness, and the variability in the two brands might be com- 
pared. It is with such problems that the following sections are 
concerned. Аз before, it will always be assumed that the 
populations are normal; for nonnormal cases the reader is referred 
to Chap. XVIII. 

Testing a Difference in One Direction. The method of testing 
the difference between two sample variances varies to some 
extent with the particular problem involved. If the problem is 
concerned with testing a difference in one direction only, one 
method is applicable; if it is concerned with a difference in either 
direction, another method is required. The present section will 
deal with the testing of a difference in one direction only. 

The Specific Problem. То keep the discussion concrete, con- 
sider once again the data on industrial employment among small 
companies. Suppose it is claimed that, owing to the instability 
Introduced by the depression, there was a greater variability in 
the size of companies in 1935 than in 1929, size of companies 
being measured by amount of employment. The variance in 
employment among the ten 1929 companies, it will be recalled, 
Was 9,834.5, and the variance among the ten 1935 companies was 
18,832.1, which would seem offhand to substantiate this claim. 
The immediate statistical problem is to determine whether this 
apparent difference in variability is great enough to be attributed 
to some specific causal factor such as the depression or whether 
such а difference could reasonably be attributed to the chance 
effects of sampling. 

The Null Hypothesis. In carrying out this statistical test the 
first step is to set up the null hypothesis that the difference in 
variance is due, not to some specific causal factor, but only to 
chance. In other words, the hypothesis states that the 1929 and 
1935 populations have the same variance. It will be noted that 
ц does not state that the populations are the same, for nothing 
is said about the means of the populations. The following test is 
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therefore independent of whether the means of the populations 
are the same or not. It thus differs from tests of the difference 
between two means, which are based upon the assumption that 
the variances are the same. 

It should be noted also that the hypothesis assumes that the 
samples are independent of each other. If the data on employ- 
ment pertain to the same companies in both years, then the 
samples would not be independent of each other and the following 
analysis could not be validly used. 

The Statistic. The statistic that is most convenient to use in the 
present instance is the ratio of the maximum-likelihood estimates 
of the population variances in the 2 years. The maximum-like- 
lihood estimate of the population variance made independ- 
ently from the first sample! is wey 
estimate of the population variance made independently from the 
second sample is X iN. j Where of and о? are the two sample 


»and the maximum-likelihood 


variances and №, and №, are the number of cases in each sample. 
The statistic that is used for this problem is the ratio of these 
two maximum-likelihood estimates, i.e., 


oîNı/(Nı — 1) (8) 
03№»/ (№: — 1) 


If the two populations have identical variances, these two esti- 
mates will be approximately equal and the above statistic will be 
close to 1. The statistical problem is to determine whether it 
differs from unity by an amount greater than can reasonably be 
attributed to chance. 
The Sampling Distribution 
hood Estimates of Variance. 
estimates of variance is used 
ence is that the sampling di 


of the Ratio of Two Maximum-likeli- 
The reason why the ratio of the two 
rather than their arithmetic differ- 
stribution of the former can more 
readily be determined. Mathematical analysis shows that this 
sampling distribution is of the form of the F distribution with 
nı and nz of the F equation equal to №, — Тапа №, — 1, respec- 
tively. As in other cases, ^, is the degrees of freedom involved 
in estimating о; and n; the degrees of freedom involved in esti- 
mating a». 


1 That is, independently of the mean, 
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"That is, if pairs of samples are drawn independently from popu- 
lations with identical variances and if the ratio of the maximum- 
likelihood estimates of variance is caleulated for each pair, these 
ratios will have a frequency distribution that is of the form of the 
F distribution, with nı = Ni — 1 and n» = № — 1. It might 
be well for the reader to turn back at this point to Chap. VI and 
reread the section there on the F distribution. It is also well to 
note again that this sampling distribution is valid only for ratios 
that are computed from independent samples. 

The Region of Rejection. The present problem is concerned 
with whether the 1935 variance is greater than the 1929 variance. 
If the statistic selected is the ratio of the 1935 variance to the 
1929 variance, then the best region of rejection to employ is the 
upper tail of the F distribution. For it has been shown that, if 
the presumably larger variance is actually the larger, there will 
be more chance of rejecting the null hypothesis if the upper tail 
of the F distribution is used than if any other region is adopted. 
Of course, it would be possible to take the ratio of the presumably 
Smaller variance to the presumably larger one, and in this case 
the lower tail of the F distribution would be the more appropriate 
One to employ. This alternative course is not followed, however, 
for the tables of the F distribution are computed only for the 
Upper tail and, as noted, the distribution is not symmetrical. In 
the present instance the upper tail of the F distribution will be 
employed as the region of rejection, and, as usual, the size of this 
region will be taken as .05. 

For the given problem, the maximum-likelihood estimate of the 

; . (10) (18,832. 1) 
Population variance based upon the 1935 sample is gee 


and the maximum-likelihood estimate of the population variance 


4.5 - " 
based upon the 1929 sample is (10 8365). The ratio of the 
first to the second is 1.915. In this problem, this is the same as 
the ratio of the two sample variances themselves, since the 
Samples are of the same size. In another problem in which the 
Samples are of different sizes, this equality would not 
exist. 


„_ Cf. Neyman, J., and Е. Pearson, “Оп the Problem of the Most Efficient 
Test of Statistical Hypotheses," Philosophical Transactions of the Royal 
Society of London, Series A, Vol. 231 (1933), pp. 289-337. 
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Testing the Hypothesis. To test the given hypothesis it is 
necessary merely to note whether the above ratio is greater or less 
than the .05 point of the F distribution for which n, = 9 and 
nz = 9.* The tables, unfortunately, do not give the .05 point 
of this particular F distribution, so that its value must be Е 
polated. For n; = 8, n; = 9, the .05 point is F = 5.467; for 
^; = 12, n; = 9, the .05 point is 5.111. "Therefore, for the F 
curve for which n; = 9 and n» = 9, the .05 point must lie between 
these two values. Whatever its exact value, it is clear that the 
sample ratio of 1.915 is well within the area of acceptance, so that 
the sample value does not fall in the region of rejection , Which has 
been taken to consist of all values of F equal to or greater than 
the .05 value. The null hypothesis is thus not rejected in the 
present problem, and the difference in sample variance is to be 
attributed to chance. 

If neither т; nor n; have values that are given directly in the F 
table, it would be necessary to interpolate in both directions. 
For example, if nı = 9 and n; = 40 for the given samples, then 
it would be necessary to find the following .05 values: 


For The .05 point is 
т = Sand n; = 30 3.173 
m = 12 and n = 30 2.843 
т = 8and n, = 60 2.823 
m = 12 and nz = 60 2.496 


It is obvious that the .05 point for nı = 9, n, = 40 lies some- 
Where between 3.173 and 2.496. If the given sample ratio lies 
beyond 3.173, it will certainly lie beyond the .05 value for n, — 9 
and m» = 40. Similarly, if the sample ratio lies inside 2.490, it 
certainly lies within the .05 point for n, = 9 and тә = 40. 

In such cases there is no need for exact interpolation; but if the 
sample lies between 3.173 and 2.496, then the -05 point for nı = 9 
and n; = 40, may be roughly obtained by straight-line interpola- 
* It will be noted that та = 


2 bm = №, — 1 and that М, refers to the size of the 
sample whose variance is put on top of the fraction ex i he ratio, 
pressing t 


in this case to the 1935 sample." Thus N, refers to the sample that has the 
presumably larger of the two estimates of variance, Likewise, na = №, — 1, 
where N» refers to the size of the sample whos ENS i 

nator of the fraction. 
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tion.! First interpolate for nə = 40. Since the .05 point for 
nı = 8, n = 30 is 3.173 and the .05 point for nı = 8, n; = 60 is 
2.843 and since 40 is distant from 30 by 42 of the distance between 
30 and 60, the .05 point for nı = 8, n» = 40 will be approximately 
equal to 3.173 — (48)(3.173 — 2.843) = 3.068. Likewise, the 
:05 point for nı = 12, ns = 40 will be approximately equal 
to 2.823 — (42)(2.823 — 2.496) = 2.714. Values fo m = 8, 
n» = 40 and nı = 12, ns = 40 having been obtained, it remains 
to interpolate for nı = 9. The .05 point for n, = 9, ns = 40 
Will thus be approximately equal to 


3.063 — (2)(3.063 — 2.714) = 2.976. 


This is the .05 point desired. As noted above, straight-line 
interpolation is not perfectly accurate. Consequently, if a 
sample value falls close to the interpolated value, it is well to 
forego any definite conclusion. In such an event it is better to 
judge the result a borderline case. 

Testing a Difference in Either Direction. The Problem. The 
foregoing test of difference between two sample variances was 
based upon the assumption that the investigator was interested 
in a significant difference in one direction only. It was on this 
basis, it will be recalled, that the upper tail of the F distribution 
Was selected as the region of rejection. There may be cases, how- 
ever, in which the investigator is indifferent as to whether any 
difference that might exist is in one direction or the other. This 
is the problem that will now be considered. 

The Test. It might be thought offhand that, when the investi- 
gator is indifferent as to the way in which the two variances might 
differ, a satisfactory test could be devised by distributing the 
region of rejection equally between the two tails of the F distribu- 
tion. ‘This is not true; for such a test is biased in that the prob- 
ability of accepting the null hypothesis in such an instance may 
be greater in some cases in Which the null hypothesis is not true 
than when it is true. The test described below avoids this 

* Better results may possibly be obtained by taking the variable 


niF 
та 
mF + ns 


and using tables of the incomplete beta function. Cf. RIDER, Paur R., 
Introduction to Modern Statistical Methods (1939), p. 119. 


412 ADVANCED SAMPLING PROBLEMS 


difficulty. It is unbiased in that the probability of accepting the 
null hypothesis when it is true is greater than for any instance 
when it is not true and in that the probability of rejecting the null 
hypothesis when it is true is less than for any instance when it is 
not true. 

Let two samples be taken from normal populations. Let the 
variance of one sample be сі and the variance of the other sample 
be оз. The investigator, it will be presumed, is interested in 
testing the null hypothesis that the two population variances are 
the same, and he is indifferent as to whether any possible differ- 
ence between them is in one direction or the other. On these 
assumptions, the best statistic to employ is 


= (ni + п») log, č? — nı log. č? — ne log. čł (9) 
1 + а 


E . т " No? N+ 2 T 
in which nı = №, — 1, n, = М, — 1,4% = Nx mi (that ^ 
č? is the maximum-likelihood estimate of the population vari- 
ance based upon the two sample variances taken together), 

Nis} Nao} AS А 
2 = L- g m E: X -lik sti- 
0% Тї id 62 т (the maximum-likelihood е 


mates of the population variance based upon the two sample 
variances separately), and 


) 1 کپ و ا اھ 
ЗАМ -ТОМ-1 ММ =‏ 

The statistic shown in Eq. (9) has been found to have a sam- 
pling distribution that is approximately of the form of the x? 
distribution with the degrees of freedom n equal to 1. It leads 
to an unbiased test if the upper tail of the distribution is taken 
as the region of rejection,1 


An Example. 


1 Cf. PITMAN, E, J. G., “Test of Hypotheses concerning Location and 
Scale Parameters,” Biometrika, Vol, 31 (1939), Pp. 200-215. The L used 
above is equal to Pitman’s 2L and is not the same as his L, 
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indicate that the population variances are not the same? This 
will be answered by setting up the null hypothesis that the 
populations have the same variance and then determining 
Whether this hypothesis can reasonably be accepted on the basis 
of the sample results. In making this test it will be presumed that 
the investigator is indifferent as to whether any possible differ- 
ence between the variances is in one direction or the other. 
The quantities required for the calculation of the statistic L 
are 
wo _ 10(9,834.5) + 10(18,832.1) _ 
в = Bru = 15,926 
log, č? = 2.30259 login 15,926 = 9.675730 


log, 81 = log, wet = 2.30259 logio 008—0 = 9.299017 
pe 

log, 2j = log, nr = 2.30259 logio 10088221) = 9.948582 
— 


САИ, 1 И 
«-()G*$- i9) = 


Numerator of L = (9 + 9)(9.675730) — [9(92.299017) 
+ 9(9.948583)] 


174.163140 — 173.228391 
‚984749 


and L equals this number divided by 1 + а, that is, 


ll 


L = 984149 _ 53555 


The coefficient of risk will be taken equal to .05 as previously, 
and this region of rejection will be the upper .05 tail of the х? 
distribution for n = 1. The table of the x? distribution in the 
Appendix (Table VIII) shows that the lower limit of this region 
153.841. Since the value of L is well below this limit, the null 
hypothesis is not rejected and it is concluded that the two 
Samples may reasonably have come from populations with the 
Same variance. 


ARE TWO SAMPLES FROM THE SAME POPULATION? 


The Problem. The problems so far considered have been con- 
cerned with whether the normal populations from which two 
Samples have been drawn differ with respect to a single charac- 


414 ADVANCED SAMPLING PROBLEMS 


teristic, for example, with respect to their means or with respect 
to their variances. The questions posed were: (1) On the 
assumption that the variances of the populations are the same, 
are the means also the same? (2) Without any assumption 
regarding the means of the populations, are their variances the 
same? The question now to be raised is: Are both the means 
and variances the same? This question differs from (1) in that 
there the variances were assumed to be the same, whereas in the 
new question the equality of the variances is part of the hypoth- 
esis to be tested. 

For example, suppose a manufacturer of automobile tires is 
comparing two processes. If he knows or has good reason to 
believe that the variability in mileage is the same for tires manu- 
factured by one process as for those manufactured by the other, 
and if he is interested only in а possible difference in average 
mileage, he will use one of the procedures above for testing 
the difference between means. If he does not care about a possible 
difference in average mileage but is interested only in a possible 
difference in variability, he will use one of the procedures for 
testing the difference between variances. Finally, if he is 
interested in whether the tires manufactured by the two proc- 
esses differ either with respect to average mileage or with respect 
to variability, or with respect to both, he will employ the test 
outlined below. 

The Test. When the populations are normal, the best test 
that can be made of joint equality 
to be offered by the statistic! 


in which оо is the standard deviation of the two samples treated 
as а single sample and c; and сз are the two individual sample 
standard deviations. The value of e? and hence of со, may be 
found by throwing the two samples together and ealeulating the 


of means and variances appears 


1 The symbol Az is used in the original article and is continued here. The 
H has no significance other than to distinguish the statistic | 

? МЕУМАМ, J., and Е. S. Pearson, “On the Problem of туо Samples," 
Bulletin international de І? Académie polonaise des sciences et des lettres 


Classe des sciences mathématiques et naturelles. Béri р, - 
n . Sé : athé: 
matiques (1930), pp. 73-96. rie A: Sciences m 
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value of 2 - where X is the mean of the combined 


№, + 
samples. If only the value of the sample means and standard 
deviations are known, со may be computed from the relationship 


є _ Nii + Nook NN: = — 

a Nit Ns + W, F № (X, — X2)? (11) 
Table 48 shows the lower! .05 and .01 points of the sampling 
distribution of Ан for various values of М, and №. The use 
of this distribution in testing a joint hypothesis will now be 
illustrated. 


Ta 
ABLE 48,— APPROXIMATE VALUES OF THE Lower .05 (IN BOLDFACE TYPE) 
AND .01 POINTS OF THE SAMPLING DISTRIBUTION OF Ан FOR SELECTED 
Varues or № AND Nat 


Approximate values of Aur 


№. Probability 
N 5 10 20 50 © points 
i P(r 5) 
5 .0167 | .0222 | .0241 | .0247 | .0248 .05 
"0019 | .0029 | .0033 | .0034 | .0034 .01 
10 ааа | .o312 | .0349 | .0364 | .0368 | .08 
:0029 | .0048 | .0058 | .0061 | .0062 .01 
20 .0241 | .0349 | .0401 | .0425 | .0432 .05 
'0033 | .0058 | .0071 | .0078 | .0080 Ol 
50 0247 | .0364 | .0425 | .0409 | .0473 .05 
"0034 | 0061 | -0078 | .0088 | .0092 | 01 
% .0248 | .0368 | .0432 | .0473 | .0500 | .05 
eee a —————— 


nan and E. S. Pearson, "On the Problem of 
l'Académie polonaise des sciences et des lettres. 
Série A: Sciences mathematiques (1930), 


1 
Ne ue by permission from J. Neyn 
Class aM Bulletin international de 
Classe des sciences mathématiques et naturelles. 
* 92, Tables II and III. 

An Example. Consider once again the employment data of 
SI 1 ; sss 

nall industrial companies in 1929 and 1935. The two samples 


of 10 already analyzed for differenees in means and variances 


t 
"The greater the differences, the smaller the values of X. 
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had means of 89.1 and 123.5 and variances of 9,834.5 (= 99.16?) 
and 18,832.1 (= 137.22). The question is: In view of these 
sample results could these two samples have reasonably come 
from the same population? To answer this question the null 
hypothesis is set up that the populations are the same, and the 
value of Хи is calculated. For the given data, the value of oj is 


(10)(9,834.5) + (10)(18,832.1) | (10)(10) 


z (89.1 — 123.5)? 
10 + 10 (10 + 10)? (8 ) 
= 14,629.1 
which gives со = 120.9. Hence, 
log Ан = (10)(log 99.16 — log 120.9) + (10) (log 137.2 
— log 120.9) = —.31170 = 9.68830 — 10 


and 
An = .4879 


Table 48 indicates that, the greater the difference between the 
means and variances, the smaller the value of А н; that is, small 
values of A; are significant. Since the foregoing value of Ли is 
larger than either the .05 or the .01 value for № 1 = 10 and 


М» = 10, the null hypothesis must be accepted. In other words, 
there is little reason in this case to believe that the two samples 
are not from identical populations. 

In conclusion, it should be noted that, if this test should show a 
significant difference between the two samples, there is nothing 
in the result itself that will tell Whether the difference lies in the 
means or in the variances, or in both. In fact, the result is 
purely a joint product; for the significance of the difference 
between the means will depend to some extent on the amount of 
difference in the variances, and vice versa. In the present 
instance, the individual tests Showed no significant difference 
between either the means or the variances, and it could not 
therefore be expected that the two samples as a whole would 
be deemed significantly different. In some cases, however, one 
of the individual tests might show a significant difference, while 
the joint test would fail to show any difference. As suggested 
above, the conclusions to be drawn from the various tests 
depend entirely upon the problem. Each type of problem has 
its own appropriate test and should not be confused with tests 
appropriate for other types of problem, 
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DIFFERENCE BETWEEN TWO INDEPENDENTLY DERIVED 
CORRELATION COEFFICIENTS 
The difference between two sample correlation coefficients can 
readily be tested by making use of the z transformation. A 
sample z, corresponding to à sample r, it will be recalled, has a 
sampling distribution that is approximately normal in form, 


with a variance equal to ў L 7 Accordingly, the difference 


between two sample z's is also practically normal in form with a 
variance equal to? 
А 1 1 
deu. wr (12) 
Thus to test whether 212 is significantly different from 24 it is 
merely necessary to note whether? 


1 
£n 2 2 1.96 


m I 

y -3*N'-3 
The problem is the same as the difference between two sample 
Thus, if ri» = -7658 and 
= .95; and if N = 40 


means when the variances are known. 
т = .7398, and hence zi» = 1.01 and 212 
and № = 60, the quantity 


Me 1.01 — .95 
ЫЕ в с. would be equal to Ре: 


mE 1 
vn Bg TNF 


or 2.86. Since this is greater than 1.96, it may be concluded 
that the two correlation coefficients 712 and rj, are significantly 
different. 


DIFFERENCE BETWEEN TWO INDEPENDENTLY DERIVED 
REGRESSION PARAMETERS 
o independently derived regression 


The difference between tw ^ 
as the difference 


parameters can be tested in the same way 


1 See Chap. XII. TE i 
21 two variables are independent, the distribution of wien 


between two normally distributed variables is normally distributed and has 
a variance that is the sum of the variances of the two variables (cf. Appendix 


to this chapter, pp. 419-421). 
tion equally di 


з This assumes a region of rejec istributed at each end. 
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between two means. First, the sample higher-order variances 

are pooled to give an estimate of the population higher-order 

variance. Thus : 
№’ (лаз...) N” (ол ъз". )2 

P N' +N" — 2k 

Then the standard error of the difference between the two 


2 
91.23... 


(13) 


1 sould 
regression parameters, say, between biss... and 5%... woul 
be given by ~ 
EE | 
бул... ъз... = Мм» MELIUS ( 
in which 
; Fis... 
CS. = — 
9534... VN’ 
xe 01.23... 
Фаз... m VU Ro 


The test of the difference could then be made by comparing 
the difference between the b’s with the standard error of the 
difference, using the ¢ distribution if the sample is small or the 
normal eurve if the sample is large. 

For example, Suppose that, in a given problem, N’ = 50, 
biza = 2.7, (ole)? = 25, and (03.3)? = 43, while N” = 70, 

12.3 = 3.1, (07,5)? = 36, and (cz3)* = 48. Then 

čin = 00005) + (1086) = 33.07 
Since k = 3, which is the numb 
regression equation. Also, 


a _ 33.07 _ 
быз = GO) = 0154 


er of regression statistics in the 


and 
33.07 
o ee А 
Ире (48) (70) = .0098 
Hence, 
ыле, = 0154 + .0098 = .0252 
and 


Brus, = V0252 = 16 
The difference between bf, апа 01.318 —.4, and the ratio of this 


difference to itso is —.4/.16 = — 2.5. Since the samples are large, 


the sampling distribution of bios — 555 may be assumed to be 
normal. Hence a deviate of 


—2.5 lies beyond both the .05 point 
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and the .025 point—i.e., beyond —1.645 and beyond —1.96—and 
the hypothesis that bis. and Ы! з came from the same population 
is to be rejected. That is, Блез must be deemed “significantly 
different" from 0353. 


APPENDIX 


THE MEAN AND VARIANCE OF THE SUM OR DIFFERENCE OF 
TWO VARIABLES 
1. The mean. Let Z be equal to the sum (or difference) of the 
variables X and Y so that Zy = X: + Y; Let X take on the 
values Ху, X», and Хз and Y the values Yı, Ys and Ys; and let 
the joint relative frequencies, or probabilities, of pairs of X and 
Y be as follows: 


Thus рә means the probability of an X.Y. combination, pai 
the probability of an X,Y, combination, etc. In the interests 
of simplicity, only three values are taken for each variable. The 
argument is equally valid, however, for any number of values 
for each variable and for continuous as well as for discrete 
distributions. 

Since Z;; = X; + Y; the various values of Z and their prob- 
abilities are as follows: 


2 | PZ) | 2 | P(Z) | 2 Р(2) 
Xi t Yi pu | ХУ: Da ХУ, pa 
Xi + У, pis | Xs + Y: 22 Xs t Ys P32 
Xi + Уз pis Xs + Үз pes Хз + Ys D33 


This is the distribution of Z. 
The individual distributions of X and Y are as follows: 


Oe en M S 5 


м 


x P(X) Y | P(Y) 

Xi pi = pu + Р: + Різ Yi р; = pu + pa + Ра 
Xe pz = pu + P22 + pes Y: | р; = prs + pot paz 
Xs ps = ра + ps2 + Ps Ys р = Раз + psa + Pas 
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The problem of this section is to express the mean of Z in 
terms of the means of X and Y. | | А 
"By definition, Z — =P(Z)Z, and when written out in full this 
becomes 


Xi + FD + pa(Xs + Уу) + Рз1(Хз + У,) 
к == nuQ: + У.) + poo(X2 + Y.) + Р(Х. + Y2) е 
T PuC E Үз) + p(X: + Vs) + ра(Х + Үз) 


Upon removal of parentheses and collection of common terms, the 
mean of Z is seen to be equal to 


[и + pis + pu) Xi + (Por + poo + Р») Xo 
+ (фа + P32 + pas) Xs] + [(pu + Pa + psi) У, 


+ (р + P22 + pas) Ya + (pis + pos + pss) Үз] 
Hence, 


Z 


(Ф.Х, + Xs  рзХ.) + (му, + му, + рҮ») 
AY 


That is, the mean of the sum or difference of two variables is the sum 


or difference of their means. (This is true Whether the variables 
are independent or correlated.) р 
IL Variance. Let X and Y both be measured from their 
mean values, and let z be the sum (or difference) of X and Y when 
So measured. Hence 2; = tity. From I, it follows that 
7 —£ tj; and since ғ = Я — 0, 2 also equals 0. Thus the 

variances of the variables become 
@ = Ipa? аз = урур 


The problem is to express à? i 


In terms of ê2 and 92. 
When written out in full, à? is as follows: 


Puls + y1)? + pa(z, + V1)? + pa(zs + y)? 
uu Y2)? + poole + Y2)? + рза(тз + Ys)? " 
+ Pulti + Ya)? + Polt + Ys)? + рзз(хз + ys)? 


Upon clearing parentheses and collecting ¢ 
becomes 


a Хр 


ommon terms, this 


+ (pis T Рз + P33)? $ 2(puziyi 


+ Partey; + Patsy1 + pisuy? 
T Dastoys + Рзатзу> 


us pastays) 
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which reduces to 
pitt + pork + porh pui рау + pay = Зри; 


Accordingly, 
Ури 


2 1 * = 
û? = 62 + êj 216.8, sincer = Com 


That is, the variance of a sum of two variables equals the sum of 
their variances plus twice the correlation coefficient times the product 
of the two standard deviations. Also, the variance of a difference of 
two variables equals the sum of their variances minus twice the 
correlation coefficient times the product of the two standard deviations. 
Finally, if the two variables are independent, r = 0 and the variance 
of their sum or difference equals the sum of their variances. 


CHAPTER XVII 
ANALYSIS OF VARIANCE 


In recent years, much use has been made of a so-called **analy- 
sis of variance.”! This has been particularly true in the field of 
biological and agricultural experiments. 'The method has such 
wide application, however, that there is hardly any field of 
statistical investigation in which it cannot be employed. 

Analysis of variance is essentially a method of testing for the 
existence of correlation or association. The technique consists 
in classification and cross classification on either a qualitative or 
à quantitative basis and in comparison of the variation from class 
to class with the variation within classes. 'The analysis is 50 
arranged that variation within classes can be presumably attri- 
buted to chance. The test of association or correlation consists 
in comparing the variation between classes with the supposed 


chance variation within classes. The problems discussed below 
will illustrate the details of this procedure. 


PROBLEMS INVOLVING A SINGLE BASIS OF CLASSIFICATION 


Nature of Problems. In the simplest cases of analysis of 


variance there is only one basis of classification, This will 
accordingly be the first type of problem to be discussed. 


In Table 49, on page 426 are listed the grades of 15 representa- 
tive students? in an elementary course in economics. These are 


1 Historically, the method ca 


n be traced at least to the 1910 edition of 
G. Udney Yule’s Introduction to 


the Theory of Statistics, in which Chap. V on 


A certain similarity 
is’s analysis of sub- 
Cf. Riez, L, Mathematical Statistics, 


see W. Lexis, “Über die Theorie der 
Stabilität statistischer Reihen,” Jahrbücher für Nationalökonomie und 


Statistik, Vol. 32 (1879), pp. 60-98; and W. Lexis, Abhandlungen zur Theorie 
der Bevölkerungs- und M. oralstatistik, (1903), Chaps. V-IX. 
? Actually, each grade is the average of the grades of several students, 
but for the present analysis it will be considered a single individual grade. 
422 
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classified according to the teacher of the student, and the problem 
is to determine whether the difference in teaching has an effect 
on the grades of the students. If the difference in teaching has 
an effect, it will be reflected in variation in the mean grades of the 
students as between teachers. The problem is therefore to 
determine whether the variation in mean student grades from 
teacher to teacher is greater than can reasonably be attributed to 
chance. 

Theoretical Basis for the Analysis. It may be assumed that 
the variation between grades of students having the same teacher 
is due to the chance difference in ability of the students. This 
would be the case if the students were assigned to the teachers 
at random. If this variation of grades within each group is 
pooled for all groups, à good estimate will be secured of how 
much variation can be expected purely as à result of chance. 
This will form a standard with which to compare the variation 
in the mean student grade from teacher to teacher. Of course, 
the latter cannot be expected to vary as much as individual 
grades, since they are mean values. Allowance for the smaller 
Variation among means can be made, however, by proper weight- 
ing of the variation in the mean grades before the comparison 
13 made. 

The theoretical basis for the precise method of comparison 
that is used may be outlined as follows: The first step in the 
analysis is to set up the null hypothesis that the difference in 
teaching has no effect on the grades. Under these conditions 


both the variation in the means of student grades from teacher 


to teacher and the variation of grades within the groups of stu- 
1 stem back to the variation in 


dents having the same teacher wil 
general in student grades, or to what may be called the “ popula- 
tion variance." If this is large, variation in the mean student 
grades from teacher to teacher will tend to be large, as will also 
variation around these means. If, on the other hand, the popu- 
lation variance is small, the variation in mean student grades 
from teacher to teacher will tend to be small and so will variation 
around these means. 
Accordingly, if the null hyp 
variation in the mean student grades 
or the variation around these mean gra 


1 It will be recalled that the standard error of a mean is equa 


othesis is correct, either the sample 
from teacher to teacher 
des could be used to esti- 


lto 6//N. 
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mate the underlying population variance. Of course, the two 
estimates could not be expected to be the same.! In one sample 
the estimate based upon variation in the means would be larger 
than that based upon variation around the means, and in another 
sample the opposite might be true. If the null hypothesis is 
true, however, the ratio of the first to the second would tend to 
average in the neighborhood of unity, and large deviations from 
unity would not be very likely. This suggests that, in any 
particular problem, the null hypothesis could be tested by 


d unreasonably from unity. 

Such, in general, is the theoretical basis of analysis of variance. 
To put it to use, however, requires more exact Specifications 
regarding the nature of the two estimates of variance and the 
form of the sampling distribution of their ratio. 

The estimates of variance that are used are maximum likeli- 
hood estimates. If a large number of this kind of sample esti- 
mates is obtained, their mean will have approximately the same 
value as that of the population variance. In other words, the 
mean of the sampling distribution of а maximum-likelihood 
estimate of variance is the population variance. For this reason 
& maximum-likelihood estimate of variance is also called an 
“unbiased ” estimate. 

Mathematical analysis shows that the 


maximum -likelihood 
estimate of the population variance that is b 


E ased upon the varia- 
tion in the means is given by me ees 2), that is, by the 


weighted sum of the s 
means about the mean of the entire sample of grades divided by 
the number of means minus 1. The N, is the number of student 
mean; r is the number of means 
d also be the number of teachers. 
ased on r — 1 degrees of freedom. 


he two estimates are independent of each other, 
80 that the value of one is not related to the value of the other, 
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А second maximum-likelihood estimate of the population 
variance is that based upon the variation around the means, 
p: p» (Xi a X, +)? 


which is given by — NR —that is, by the pooled sum 


of the squared deviations from the individual means divided by 
the number of cases minus the number of means. This is an 
estimate of the population variance based on N — 7 degrees of 
freedom. 

These are the two estimates of the population variance that 
are used in the analysis of variance. The next question is: What 
13 the sampling distribution of their ratio? The answer given by 
mathematical analysis is as follows: If the original population is 
normal and if the null hypothesis is correct, the ratio of the two 
maximum-likelihood estimates of the population variance will 
tend to fluctuate from sample to sample in accordance with the 
F distribution. If the estimate based upon the variation in the 
means is put in the numerator of the ratio and the estimate based 
upon the variation around the means is put in the denominator, 
the appropriate Ё curve is that for which nı =7—1 and 
п =N =r. 

The Numerical Analysis. Before proceeding to the application 
of the foregoing theory to à concrete problem, certain mathe- 
matical relationships should be noted that will be helpful in 
carrying out the numerical calculations. A study of the theo- 
retical formulas shows that the numerator of each of the estimates 
of variance is a sum of squares. These could be calculated 
directly, of course, but it is usually easier to make use of the 


following identity, 
LA- B= Pk, - + УУ, (Хе -X)* |) 


which says that the total variation in the data, as represented 
by the total sum of squares, may be broken up into two parts, 
Опе consisting of the variation in the means (as represented by the 
Weighted sum of the squared deviations of the individual means 
from the grand mean) and the other consisting of the variation 
around the means (as represented by the pooled sum of the 
Squared deviations of the individual items from the mean of 
each group). 
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Since it is usually easier to calculate the total sums of squares 
and that pertaining to the means than it is to compute the sum 
of squares of the residual deviations, it is commonly the apri 
to caleulate the latter by taking the difference between the other 
two more easily calculated values. | 

In caleulating the sums of Squares the following special for- 


mulas are found useful. Thus the total sum of Squares can be 
computed from the equation: 


2, @- = Y i-e Ум Gr (2) 


and the weighted sums of squares of the deviations of the indi- 


vidual means from the grand mean can be computed from the 
equation? 


ED) xP qus " 


in which [x y. B 


7 refers to the Square of the sum of the grades 
in the rth row, 


TABLE 49.—GRADES or REPRESENTATIVE STUDENTS! CLASSIFIED 


BY TEACHERS 
"Teachers Grades of students 


Mean grade 
I 83.25 77.50 71.00 77.25 
II 88.75 74.75 70.00 77.88 
ш 76.25 67.25 69.25 70.92 
IV 78.75 | 6875 62.25 69.92 
| В | 75.75 64.75 74.00 


1 Grades are actually means of seve: 


ral students’ grades, but they are considered as if they 
were individual Student's Erades—i e, 


, each teacher has three students, 
The application of these equations to t 


he data of Table 49 is 
carried out in the calculations of Table 


90. This latter table 
and Сх): = 82,103.0042. 
Hence the total sum of squares Z(X; — X) is equal to 
82,850.1875 — 82,103.0042 — 747.1833. 
t E(X: — $)? = 28} - охуу 4 NX? = 
[X = эу. 


N 
* The equation follows from the faet that NX. = ( 
the total for all rows is УМ, Х, = VX, 


shows that УХ! = 82,850.1875 


P» — NX*, since 


УХ), for each row, and 
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Table 50 also shows that 
[Xp (ХХ): Ы 
p = 82,257.3125 — 82,103.0042 = 154.3083, 


N N 
Which is the sum of the weighted squared deviations of the 
individual means from the grand mean. The difference between 


r 
TABLE 5 7 : 
LE 50.— WORKSHEET Fon CALCULATING THE VARIOUS Sums or SQUARES 
FOR ANALYSIS OF VARIANCE 


Squares of 
Grades Partial sums = 
Individual grades Partial sums 

83.25 6,930.5625 

77.50 6,006.2500 

71.00 5,041.0000 
231.75 53,708.0625 

88.75 т,876.5625 

74.75 5,587.5025 

70.00 4,900.0000 
233.50 54,522.2500 

76.25 5,814.0625 

67.25 4,522.5625 

69.25 4,795.5625 
212.75 45,262.5625 

78.75 6,201.5625 

68.75 4,726.5675 

62.25 3,875.0625 
209.75 43,995.0625 

81.50 6,642.2500 

75.75 5,738.0625 

64.75 4,192.5625 
222.00 49,284.0000 
1,109.75 $2,850.1875 | 246,771.9875 

= XX, = ЕХ} = J [2X]; 
r 


1,231,545.0625 
= 82,103.0042 


ae 15 
> АРА! _ 24 
ie = 246.771.9375 = 82,257.3125 


e squared deviations 


these two results gives the pooled sum of th 
This last sum of 


of 1 з 
the individual items from the row means. 
Squares, 
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У, 3, (X+ — XJ, is thus equal to 747.1833 — 154.3083 
"E — 592.875. 


To make the appropriate estimates of the population varano 
the last two sums of squares are each divided by the proper 
degrees of freedom. Thus 154.3083 is divided by 5 -1 = 4 
give 38.5771 as the estimate of the population variance baser 
on the variation among teachers in mean student grades. Simi- 
larly, 592.8750 is divided by 15 — 5 = 10 to give 59.2875 as the 
estimate of variance based on the variation around the means. 
The ratio of these two estimates is 38.5771/59.2875 = .65. | 

To determine whether this sample ratio justifies the rejection 
or acceptance of the null hypothesis requires the selection of a 
suitable region of rejection. On the assumption that the risk 
of rejecting the null hypothesis when it is true will be put at the 
usual figures of 1 in 20, the size of the region of rejection will be 
05. This is purely arbitrary, however, and might be set at 
-01 or .10 or any other figure, depending on the coefficient of risk 
adopted. A more significant question relates to the distribution 
of the region. Since the difference in teaching, if it had any 
effect, would tend to be revealed in a large. 
teachers in mean student grades, values of the ratio that are 


proper region of rejection in 
:05 tail of the F distribution. 
adopted in this and in all 


the null hypothesis is accepted 
student grades from teacher t 
chance and not to any differen 


1 If the ratio is less than unity, it will never fall in the region of rejection- 
In such cases, calculation of the two estimates of variance will of itself give 
sufficient evidence for acceptance of the null hypothesis, 
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PROBLEMS INVOLVING MORE THAN ONE BASIS 
OF CLASSIFICATION 


A Single Case in Each Class. In the previous problem it 
might have been suspected that, if allowance were made for the 
general standing of the students, the difference in teaching might 
be revealed. Suppose, as is actually the case, that the first 
Student of each group is à high-standing student, the second is 
one of average standing, and the third one of low standing. Then 
the 15 grades may be classified according to two criteria, standing 
and teacher, and mean grades may be calculated for the divisions 
of each classification. This is done in Table 51, on page 482. 

Under such circumstances, two questions may be asked: (1) 
If allowance is made for the difference in standing, has the differ- 
ence in teaching any significant effect on grades in the given 
course? (2) If allowance is made for the difference in teaching, 
has the difference in standing any significant effect on grades in 


the given course? The former question is the one that primarily 


Concerns us here, but the second may also be of interest. 
In this more extended 


Theoretical Basis for the Analysis. t 
Problem there are three types of variation to be considered. 
First, there is the variation in the means of the rows, the mean 
Student grades for different teachers; second, the variation in the 


means of the columns, the mean grades of students of different 


standings; third, the variation in the individual grades about 
what would be expected from the combined row (teaching) and 
column (standing) effects. | 

This third variation needs further explanation. If a student's 
grade differed from the grand mean of all the students’ grades 
by just the same amount that the average grades of all students 
having the same teacher differed from the grand mean plus the 
amount that the average grades of all students having the same 
Standing differed from the grand mean, the grade of this student 
Would be equal to X + د‎ х) + (Ae = X). In nearly 
every case a student’s grade does not come exactly to the sum of 
this combined row and column effect but differs from it. This 
difference will be given by? 

Nope eee (Ke— X) E (X, — Х) = х, کک کے‎ Ё + Х 


M : "om f X; since any 
‘Individual grades are now designated by X« instead of А: 
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It is the variation in such differences that constitutes the third 
f variation distinguished above. 
и the variation in the row means might be due to en 
difference in teaching and the variation in the доня einn ү 
might be due to the difference in standing, this last var 7 " du 
presumably due to chance. For brevity it will be called t. 
"y inder" variation. . . 
"If the null hypothesis is set up that the difference in epo 
has no effect on the grades, the size of both the variation in t e 
row means and the size of the remainder variation will depen 
on the size of the variation in students’ grades in general—4.e., on 
the size of the population variance. Asin the previous problem, 
each of these two kinds of variation could be used to make ап 
independent estimate of the population variance, and their 2809 
could be used to test the null hypothesis. The analysis is еввеп- 
tially the same as in the previous instance except that now, 
according to the hypothesis, the measure of the chance variation 
does not contain апу possible effect of the difference in pe 
"This test, of teaching is independent of any possible effect 0 
standing. | N 
In the same manner, the null hypothesis that the difference ir 
standing does not have any effect on the students" grades ean be 
lested by comparing an estimate of the population variance 
based on the variation in the column (standing) means with an 
estimate based on the remainder (chance) variation. This is rc 
test of standing that is independent of the possible effect O 
teaching. " 
In making the foregoing tests, the maximum-likelihood n 
mates of the population variance that are used are as follows. 
The estimate of the population variance based upon the variation 
in the row means is m Xy 


The estimate based upon 


C x z SM. X) he 
the variation in the column means jg ХМС x)", and $ 


=] 


individual student has both a teacher 
student for any given combination of 
1 That these are the proper form 


s; ‚ one 
and a standing and there is only 0 
the two, iet 
d 1C 
ulas can be demonstrated by d 
mathematical analysis similar to that of Chap. X. For further discuss! Я 
see Ј. О. Irwin, “Mathematical Theorems Involved in the Ande. 
Variance," Journal of the Royal Statistical Society, Vol. 94 (1931), p. 284- 


ANALYSIS OF VARIANCE 431 
estimate based upon the remainder variation is 


EX (х. – Х, – Х. + X)? 


ет 


@ = (c = 1) 


If the null hypothesis that teaching has no effect is correct, 
and if the population is normal, the ratio of the first estimate to 
the last will have a sampling distribution that is of the form of 
the F distribution with n; =r — 1 and л» = (r — 1)(c — 1). 
Likewise, if the null hypothesis that standing has no effect is 
correct and if the population is normal, the ratio of the second 
estimate to the last will have a sampling distribution of the form 
of the F-distribution with n; = c — 1 and из = (r — 1)(c — 1). 
It is upon these theoretical conclusions that the following 
numerical analysis rests. 

The Numerical Analysis. In putting the foregoing theory into 
practice, use is made of the identity 
“al Ре РЕ P " Ра 
LY E a Y NIE > dte X NX, — Xy 

с 
TXXG.-X-XoRX? (4) 


$c 


which merely says that the total sum of squares is equal to the 
sum of the sum of squares for each of the three variations—the 
variation in the means of the rows, the variation in the means of 
the columns, and the remainder variation. The total sum of 
Squares and the sum of squares for the means of the rows have 
already been computed. If the sum of squares for the means 
of the columns is also computed, the remainder sum of squares 
can be calculated from the foregoing identity. 

The equation for the calculation of the sum of squares for the 


means of the columns is similar to that for the means of the rows 
and takes the form 


Y Xe i do M 
У мах, ув X [ME " ( хь) - 


c 


in which [> Хх. | 


refers to the squared sum of the r grades in 
с 
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the cth column. For the given data, this is equal to 
82,621.1625 — 82,103.0047 — 518.1583 


From Tables 51 and 52, it is seen that 
Хы 
Es [> j 413,105.8125 
Ne 5 


= 82,621.1625 


c 


From 'Table 50 it was found that 


(zX,)* 


BU = 82.03.0042 


Hence, by Eq. (5), 


ZNA(X. — Š)? = 82,621.1625 — 82,103.0042 = 518.1583 


TABLE 51.—Graprs or STUDENTS CLASSIFIED ву TEACHER AND STANDING 
Standing 
Teacher - — Mean grade 
High | Medium | Low 

I 83.25 77.50 71.00 77.25 

I 88.75 74.75 70.00 77.88 

ш 76.25 67.25 69.25 70.92 

IV 78.75 68.75 62.25 69.92 

۷ _ 0 75.75 64.75 74.00 

Mean grade 81.70 | 72.80 67.45 773.98 
Gran o > |. | =; 


TABLE 52.—C 


Sums of Squares 
columns of sums 


High standing 
Medium в(апйїпд.. дк 
Low standing...... 77777 


TER cs ЫЎ M Sey, кш а 408.50 166,872.2500 
364.00 132,496.0000 
337.25 113,737.5625 


413,105.8125 


- Хх. 


$ 


ur c ЗЕ M ron 


The remainder Variation, Y y (X,, 
г c 


ALCULATION OF Sum оғ SQUARES ror MEANS or COLUMNS 


= X, — X; + Xy, is cal- 
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culated from the identity (4) and is found to be equal to! 
747.1833 — 518.1583 — 154.3083 = 74.7167 


The foregoing data may be assembled in an "analysis of 
variance" table (Table 53), in which the various estimates of 
8° can be calculated. 


TABLE 53.—ANALYSIS OF VARIANCE 


Somat Degrees of freedom |, Unbiased . 
Means of rows............ 154.3083 r—-1- 4| 38.5771 
Means of columns... 518.1583 с—1 = 2 | 259.0792 
Remainder............++++ 74.7167 | (r —1)(e —1) = 8 9.3396 
"Total 747.1833 N—-1 =14 


To test the null hypothesis that the variation in the means of 
the columns is due to chance, t.e., that the difference in stand- 
ing has no effect on the grades in the given course, the ratio 
259.0779/9.3507 is calculated. The result is 27.707. For 
nı = 2 and n; = 8, the .05 value for the F distribution is 4.459. 
The sample ratio 27.707 is much greater than 4.459 and thus 
clearly lies in the region of rejection. The null hypothesis can- 
not be accepted, and it is to be concluded that the difference in 
Standing does have definite bearing on the grades obtained in 
the given course. 

To test the null hypothesis that the variation in the means of 
the rows is due to chance, i.e., that the difference in teaching has 
no effect on the grades, the ratio 38.5765/9.3507 is calculated. 
The result is 4.196. For nı = 4 and m: = 8, the .05 point of the 
F distribution is 3.838. The proximity of the sample value 
4.126 to the .05 point, 3.838, suggests that this problem offers à 
borderline case. Nevertheless, it does lie in the region of rejec- 
tion, since the sample value is the larger; and if the rule of pro- 
cedure is strictly followed the second null hypothesis cannot be 
accepted. It may be concluded in this case that the difference 
In mean teacher grades is to be attributed to the difference in 
teaching. 
culated on p. 426. The 


1 The total sum of squares is 747.1833 and was eal 
as calculated on p. 427. 


Sum of squares for the row means is 154.3083 and w 
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It will be recalled that in the previous case the hypothesis pc 
the difference in teaching had no effect on the ыы i^ vA 
rejected. Here it is rejected. The reason for the т s 
clusion lies in the different measure of chance variation. pue, 
former problem all the variation over and above the var е 
in the mean student grades from teacher to teacher was ta т 
to be chance variation. This represented chance variation ке 
student ability in the course, due to any cause whatever. a 
pared with such chance variation, the variation in mean stu » 
grades from teacher to teacher was not deemed ak 
That is, the variation in mean student grades from teacher 
teacher might easily have been explained by the chance difference 
in ability of the students assigned to each teacher.} " 

In the second problem, in which not only different teachers be! 
differences in standing are involved, the measure of chance 
variation is taken to be the variation remaining after the varia- 
tion due to difference in teaching and the variation due to 
difference in standing have both been eliminated. That is, it 18 
the chance variation in student ability within the groups of high, 
average, and low standing, It is the chance variation remaining 
after student standing has been roughly accounted for. Com- 


Б АРА : В А es 
pared with such a chance variation the diff erence in teaching do 
appear to have some effect 


In conclusion, it 
problem the variatio 
whether the vari 


chance or not. The test is valid in any case. If it should appear 


A p ns‏ و 
arlation in the means of the Аа д‏ 
better to combine this variation wi‏ 

} 3 ge 
So as to get an estimate of chance Wale 


than to use the Present test, if the variation in the means of the 


1 В "ger 
columns is due to chance, because the estimate based on a large { 
. 4 Н о 

number of degrees of freedom gives а more reliable estimate 


us 2 "n i ring to 
1 Strictly speaking, the analysis in that problem was not valid owing О 

the way in which the students grades were selected. The students were ter 

assigned to the teachers entirely at random, but in order to serve for furth 


к ег 
analysis the students Were so chosen that each teacher had an equal numb 
of high-, average-, and low-standing students, 


ANALYSIS OF VARIANCE 435 


the chance variation. What has been said about variation in the 
means of the rows applies in turn to the variation in the means of 
the columns. The test is equally independent of whether the 
variation in the means of the rows is due to chance or not. If 
the latter is due to chance, however, the test of the former is 
best based upon a chance variation that includes the variation 
in the means of the rows. 

More than One Case in Each Class. The Problem. In the 
preceding problems it was assumed that there was only one 
student of each standing assigned to each teacher. Аз a matter 
of fact, four students of each standing were assigned to each 
teacher, and the grades of the previous problem were actually 
the mean grades of these groups of four students, rather than 
grades of individual students. The full data are given in Table 
54, on page 437. 

The problem now to be considered is the analysis of variance 
of this full set of data shown in Table 54. Three questions may 
be asked: (1) Is the variation in the means of the rows greater 
than may be reasonably attributed to chance? (2) Is the varia- 
tion in the column means greater than may reasonably be attri- 
buted to chance? (3) Are the individual cell means what may 
be reasonably expected on the assumption that each individual 
item is a mere sum of a row effect plus a column effect, or do these 
cell means give evidence of an interaction between row and cell 
effects? In other words, the three questions are: (1) Has the 
difference in teaching an effect on grades? (2) Does difference 
in general ability affect grades? (3) Is there any interaction 
between teacher and student ability—i.c., does one teacher do 
better with high- (or low-) standing students than another? 

Theoretical Basis for the Analysis. In this problem there are 
four types of variation to be considered. ‘There are the variation 
in the row means, the variation in the column means, the varia- 
tion in the means of each cell about what may be expected from 
a linear combination of row and column effects (the so-called 
“interaction ”), and, finally, the variation in the individual items 
about the means of each cell. 

The variation in the individual items about the means of the 
cells is presumably due to chance and may be spoken of as the 
“chance” or “remainder” variation. It will be noted that this 
is not the same as the remainder variation of the preceding 
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problem. The remainder variation in the preceding problem 
is now the “interaction.” Previously, the remainder epe 
was presumed to represent chance, for there was no other | e 1 
for measuring chance variation. Here there is another basis 3 i 
measuring chance variation, and the former remainder variatio е 
may now be tested for the possible existence of correlation O 
i raction. 5-4 
= the present problem three hypotheses may be tested. First 
there is the null hypothesis that the difference in teaching has no 
effect on grades; second, the null hypothesis that the difference 
in standing has no effect on grades; third, the null hypothesis 
that there is no real interaction between teaching and standing. 
To test the first hypothesis an estimate is made of the popula- 
tion variance based upon the variation in the row means, and 
this is compared with an estimate based upon the remainder or 
chance variation. To test the second hypothesis an estimate 
of the population variance based upon the variation in the column 
means is compared with the estimate based upon chance varia- 
tion. To test the third hypothesis an estimate based upon the 
interaction is compared with the chance estimate. The estimates 
based upon the different sources of variation are the following; 
each is a maximum-likelihood, or unbiased, estimate: 
Estimate based on the variation in the means of rows: 


=NA(X, — X) 
T ф=ү 
Estimate based upon the variation in the means of the columns: 
ZNQX.— Xy 
"ee 


Estimate based upon the interaction: 


ХУ. (Е. Ж, ху 
xa (r — 1)(с — 1) 
Estimate based upon the remainder or chance variation: 
Li (XK: — X, 


Il 5 MA 
N — те 
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. Mathematical analysis shows that, if the first null hypothesis 
1s correct and if the population is normal, the ratio of the first 
estimate to the last has a sampling distribution of the form of 
the F distribution with nı = г — 1 and n» = N — re. If the 
second null hypothesis is correct and the population is normal, 
the ratio of the second estimate to the last has a sampling distri- 
bution of the form of the F distribution with n; = c — 1 and 


TABLE 54,—Снлр=з or STUDENTS IN A GIVEN COLLEGE COURSE CLASSIFIED 
ACCORDING TO THEIR GENERAL STANDING AND TEACHERS 


Teacher igh, | Медны | selling | Means of roms 
I 90 87 80 
. 76 77 77 
85 75 68 
82 71 59 
Average 83.25 | 77.50 71.00 77.25 
п 88 58 2 | 
85 7T 58 
90 81 80 
92 83 70 
Average 88.75 74.75 70.00 77.83 
IH 84 62 61 
80 73 TT 
60 70 72 
81 64 67 
Average 76.25 67.25 69.25 70.92 
IV 80 65 63 
i 76 69 70 
90 77 46 
69 64 70 
Average 78.75 68.75 62.25 69.92 
¥ 80 80 47 
80 80 62 
75 76 81 
91 67 69 
Average 81.50 | 75-75 |_ 64.75 _ Г 0 
Means of col- 81.70] 72.80 67.45 | Grand mean 73.98 
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na = № — rc; and if the third null hypothesis is correct and the 
population is normal, the ratio of the third estimate to the last 
has a sampling distribution of the form of the F distribution with 
т = (г —– 1)(с — 1) and m=N — р The analysis thus 
proceeds in much the same manner as in the previous problems. 


TABLE 55.—CarcvLATIONS ror ANALYSIS OF VARIANCE 
Squares of individual grades 


76 | 7,225 | 6,724 


8,100 5,7 7,744 7,225 8,100 
8,464 7,056 6,400 3,600 6,561 6,400 5,776 
8,100 | 4,761 | 6,400 | 6.400 5,625 | 8,281 7,569 
5,929 | 5,025 | 5,041 | 3,364 5,929 | 6,561 6,889 
3,844 5,329 | 4,900 | 4,096 4,225 4,761 5,929 
4,096 | 6,400 | 6,400 | 5.776 1,489 | 6,400 | 5,929 


4,624 3,481 5,184 3,364 6,400 4,900 3,721 
5,929 5,184 4,489 3,969 4,900 2,116 4,900 _ 
2,209 | 3,844 | 6,561 | 4,761 Total — 334,735 — xX? 


Sums of row grades and Squares of sums of row grades 
927 


859,329 
934 872,356 
851 724,901 
839 703,921 
888 788,544 


Total 4,439 = Y x, 3,948,351 = У [хха 
= 


Sums of column grades and squares of sums of column grades 
1,634 


34 2,069.956 
1,456 2,119,936 
— 1349 _ 1,819,801 


Ж 
Total 4,439 = x, 6,009,694 = У (хх, 


c 


The Numerical Analysis. In the numerical calculations use is 
ара made of an identity in which the total sum of squares is 


broken up into its various components. This identity is as 
follows: 


У X= = ул, Ху Ў NAR, — xy 
EER Neu- Я. Хра LXXOS-i (0 


i 
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As in the other problems, the first four sums of squares are cal- 
culated directly, and the remainder sum of squares is calculated 
аз а residual. These calculations are shown in Table 55. 

. From Table 54 it is seen that the number of grades in each row 
is 12; each teacher has 4 students' grades of high, low, and 
medium standing, making 12 altogether, and hence, М, = 12. 
The number of grades in each column is 20; therefore, №. = 20. 
The number of grades in the column of one row is 4; thus, 
№, = 4. The total number of grades is 60, and N = 60. The 
following calculations may now be made: 

The total sum of squares is best calculated from the formula 


(Xj? 


У(Х; — Xy 2X} - ру (7) 


Table 55 shows that this is equal to 


y xy А 
x, QX та — (4489) 
2 N = 334,735 60 
= 334,735 — 328,412.02 = 6,322.98 


The weighted sum of the squared deviations of the means of 
the rows is calculated from the equation 


ух (уху 
— — Ge» 


т 


(8) 


Where [X xd represents the square of the sum of the indi- 


vidual items in the rth row. Table 55 shows this second sum of 


Squares to be 


Sal F : 
5 È { (SX) 3,948,351 _ (139) 
М, йй 38 60 

_ 329,029.25 — 328,122.02 = 617.23 


The weighted sum of the squared deviations of the means of 
1 from the equation 


€ columns may be calculatec 
vx] xY 
Pel Ex o 


INX- X= 0 


А 


+ 


thi 
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in which [X х: | represents the square of the sum of the indi- 


vidual items in the cth column. Table 55 shows that this third 
sum of squares is equal to 


у [Xx] (5х) _ 6,609,694 — (1,139): 
№ 


N 20 60 
The fourth sum of squares is best caleulated indirectly. The 
first step is to calculate the weighted sum of the squared devia- 


tions of the cell means from the grand mean. This may be 
obtained from the equation 


= 330,184.70 — 328,412.02 = 2,072.68 


x 
» » NA Dis УУ (Ж, )з - (xy (10) 


° Since N, is constant in the present problem, the first term on the 
right may be written Nee 2:24 (X.)* To calculate this term 
€ 


it is thus necessary only to square each of the cell means, sum the 
results, and multiply by М. This squaring and summing of the 
cell means was already done in Table 50, however," where it was 
found that >X2 (designated as =X? in Table 50) equaled 
82,850.1875. Accordingly, as indicated in Table 55 the weighted 


sum of the squares of the cell means about the grand mean is 
equal to 


"E 
0 = NW = 4(82,850.1875) 

— 328,412.02 = 2,988.73 

The second step in the 

is to subtract from 2,98 

of the means 

Squared deviations of th 


< © means of the columns. The result, 
according to a modificatio 


n of Eq. (4), is the fourth sum of squares- 
! Because, in Table 50, the cell means of Table 54 were treated as if they 


were individual items; hence the sum of the squared items in Table 50 is 
equivalent to the sum of the squared means of Table 54. 
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'Thus 
XX Nr = Ke — Х, + X) = 2,988.73 — 2,072.68 
— 617.23 = 298.82 


The final, or remainder, sum of squares is now obtained by 
subtraetion of all the other component sums of squares from the 
total sum of squares. That is, 


2j X p (X; — X4)? = 6,322.98 — 2,072.08 — 617.23 — 298.82 
* 6€ i 


= 3,334.25 


All the foregoing sums of squares are collected in Table 56, 
where they are divided by the appropriate degrees of freedom to 
obtain various estimates of the population variance. Compari- 
sons of these estimates afford tests of the various hypotheses to 
be considered. 


TABLE 56.—ANALYSIS OF VARIANCE 


Sums of Degrees of | Estimates 
squares freedom | ol d! 
| — = 
р | 
Means of rows... en 617.23] 5— 1= 4| 154.31 
Means of columns.....-- s 2,072.70 | 3— 1= 2 1,036.35 
Means of cells, or interaction.....- d 4x 228 37.35 
5 — 15 5 74.09 


Remainder...... eee) 


hat the variation in the row means 


А To test the null hypothesis t À 
is greater than can reasonably be attributed to chance—i.e., that 
ffect оп grades—the ratio of 


the difference in teaching has no е 
the first estimate to the last estimate is computed. This is 
found to be 2.083. The .05 point of the F distribution for nı = 4 
and n, = 45 lies between 2.5 and 2.69. Since the sample value 
of 2.083 is obviously less than this .05 point and hence does not 
lie in the region of rejection, the null hypothesis is accepted in 
this instance. The fluctuations in grades from teacher to teacher 
is thus apparently due to chance, and there is no basis for 


attributing it to a difference in teaching. This conclusion con- 
tradicts the result of the previous problem, but since it is based 
upon a larger set of data greater confidence is to be placed in its 
validity than in the result obtained in the preceding problem. 
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То test the null hypothesis that the variation in the column 
means is due to chance—i.e., that the difference in general 
standing has no effect upon student grades in the given course— 
the ratio of the second estimate to the last estimate is computed. 
This gives 13.99, which is far beyond the .05 point of the F dis- 
tribution for nı = 2 and n; = 45.* Hence the null hypothesis 
must be rejected, and it can be concluded that general standing 
does have an effect upon the grades obtained in the given course. 

The variation in the cell means may be tested in a like manner 
by comparing the estimate of the population variance based 
upon this variation with that based upon the remainder variance. 
In the present problem the ratio is obviously less than 1 so that 
further analysis is unnecessary. The variation in cell means 
may be attributed to chance, and there is no basis for concluding 


that students of different standing react differently to different. 
teachers. 


TESTS OF CORRELATION COEFFICIENTS 
AS AN ANALYSIS OF VARIANCE 


In Chap. Xu certain tests were ———— 
a апаң coefficients were significantly different from zero. 
ene Gain Were In reality an analysis of variance. The variance 
of points on a line or plane of re 
given by 02, = 922, 


rows or col 5 Ww. i а ыш ] 
olumns was given by 0% = cînî, and the variation of 


8 of that correlation. If there was no corre- 
sample data might still yield values for these 
ould be merely the result of chance. Whether 


1F buumi Л 
For further discussion of special problems dealing with the analysis of 


ical Methods for Research Workers (1932) 
, also C. H. Goulden's Methods of 
Statistical Methods (1939). indi беа са 
? See p. 21. 
T See Surru, J. G., and А. INCAN I 
Pas Й , J. Duxcax, Elementary Statistics and Applica- 
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be taken as a measure of the amount of variation caused by 
chance forces. 

Accordingly, to determine whether variation accounted for 
by a line, plane, curve, or progression of means is due to correla- 
tion or is merely the result of chance, it may be compared with 
the variation around the line, plane, curve, ог progression of 
means. More specifically, the null hypothesis may be set up that 
all variation is due to chance. Then an estimate of the popula- 
tion variance based upon the variation of the points on the curve, 
line, or plane or upon the variation of the mean values may be 
compared with a similar estimate based upon the variation 
around these measures of correlation. In all cases, the ratio of 
these two estimates will be distributed like the F distribution. 


242 


For a line of regression, for example, the quantity i isa 


Maximum-likelihood, or unbiased, estimate of the population 
o correlation—based on variation. along 

i p 461 — т?) 
the line of regression; and the quantity NA 
[ the population variance 


Variance—assuming n 
13 а maximum- 


likelihood, or unbiased, estimate о 
based on variation around the line of regression. Since these 
two estimates have independent sampling distributions," their 
ratio has a sampling distribution of the form of the F distribution, 
With ny = 1 and лә = N =F, Fora plane of regression, the 
two maximum-likelihood estimates of the population variance 


ате Sis... and (1 = Riss- ), and their ratio is dis- 
-1 ˆ N-k 


=k—landm = N — k, 
stics аз, бз... used 
gression of means the 


tributed like the F distribution with m = 
: here Ё is the number of regression stati 
n the regression equation. For a pro 


d oine and Ч), ۾‎ 
Maximum-likelihood estimates are т and “му, and 


their ratio is distributed like the F distribution with т = k—1 
and из = N — k, Е being the number of means. Similar for- 
mulas apply toa curvilinear regression. The statistics that were 
Used in Chap. XII to test the significance of the multiple corre- 
ation coefficient, the correlation ratio, and the correlation index 
Will thus be recognized as the ratio of two maximum-likelihood 


estimates of the population variance—assuming no correlation— 


an 


1 . 
This cannot be proved here. 
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and the analysis there given will be recognized as being essen- 
tially an analysis of variance. 

The test of linearity was also an analysis of variance, but the 
assumptions are somewhat different. In this case а linear 
correlation is assumed to exist, but not a curvilinear correlation. 
On the basis of this assumption an estimate of the population 
first-order variance is made from the variation of the means— 
or curve if such is used—from the line of regression and another 
estimate is made from the variation around the means—or 
around the curve of regression. These two estimates have 
independent sampling fluctuations, and their ratio is thus dis- 
tributed like the F distribution. More specifically, 


2,2 2,2 2 
(eint; Ex gir) or (0111, = viris) 
k—2 N-k 


is a maximum-likelihood estimate of the population first-order 


variance based on the variation of the means or curve around the 

ч ° ° ° ° 

line of regression: ei(l — тї») KG — I4) a i 
gression; and ON EB IT is à maximum- 


likelihood estimate of t 
on the variation 


number of regression statisties. 'The ratios of 
qum 0 i respectively, are distributed like the F 
istribution with n =k — 2 and nı = М — k. If the reader 

tu £e 307, he will note that this ratio is the 
Statistic that was used to test the linearity of any regression. It 


w tha 18 18 a Ы 
that th: another fo 
rm of an analy: 15 


CHAPTER XVIII 
THE PROBLEM OF NONNORMALITY 


Up to this point most of the diseussion has been based upon 
the assumption that the sampled population is normally dis- 
tributed. This would appear to be a serious restriction, inas- 
much as data that are not normally distributed are frequently 
encountered. It is the purpose of this chapter, therefore, to 
consider the effect of a departure from normality upon the 
sampling distributions of some of the more important statistics. 


EXACT SAMPLING DISTRIBUTIONS 

ons have been worked out for certain 
statistics from particular nonnormal populations.! The popu- 
lations studied have been of all sorts, rectangular, triangular, 
U-shaped populations, populations that conform to a type A 
Gram-Charlier series, populations that conform to a type III 
Pearsonian curve, and the like. The distributions derived were 
mainly for the mean and standard deviation. 

Exact Distributions of the Mean. The distribution of the mean 
for samples of size N from а rectangular population, y = 3, 
z = 0 to x = a, has been found to consist of a series of № poly- 
nomials, each of degree N — 1 and applicable to a subinterval of 
length a/N. The curve is bell shaped and resembles the normal 
curve when N > 3. ForN = 2, the curve reduces to an isosceles 
triangle? It has also been found that the distribution of means 
from a Pearsonian type ПІ population is a type III curve,’ and 


Exact sampling distributi 


upon H. L. Rietz, “Topics in Sampling 


! This section is based primarily 
[athematical Society, Vol. 43 (1937), pp- 


Theory,” Bulletin of the American А 
209-230. 
* Ibid., p. 219. 


* Cf. CHURCH, А. E. R., «Оп the Means and Standard Deviations of 


Small Samples from any Population,” Biometrika, Vol. 18 (1926), pp. 321- 

394, and Crate, С. C., ‘Distribution of Means of Samples from a Type А 

Population," Annals of Mathematical Statistics, Vol. 2 (1931), pp. 99-101. 
445 
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distributions of means have been worked out for samples from 
triangular and U-shaped populations.! 

Integrations for distributions of means have been worked out 
for means of samples from the following populations:? 

1. А rectangular population represented by 


Ја) = 


1 
E: (Osx 
= (0€ г =а) 


2. А J-shaped population represented by the declining expo- 
nential curve 
ЕЕН (0€ x < =) 
З. А positively skewed population represented by the Pearson- 
ian type III or x? type curve 


f(x) = kx e 


4. A peaked population represented by the double declining 
exponential curve 


z 
2 


(05 x < =) 


k — =. 
f(z) = TUE zl (== xu m) 
forN = 2, 3, and 4 only, 
5. A triangular-shaped population represented by the two 
Straight lines 


4 
f(z) = E when 0 € x 


ША 


Y 
and ; 
4 а 
JG) = (a — а) when 5х 5а 
for N = 2 only. 

Exact Distributions of Standard Deviation and Variance. For 
samples of 2 from а rectangular population (represented. РУ 
y= 1, z = 0 tox = 1) it has been found? that the distribution 
ofc was f(c) = 4(1 — 2), The exact distribution has also bee 


1 Cf. RIDER, PAUL, “On Small Sam 
verses,” Annals of Mathematical Statistics 
? See Craic, А. T., “Оп the Distributi 
Journal of Mathematics, Vol. 54 (1932), 
3 See RIDER, op. cit, 


on of Certain Statistics,” American 
Pp. 353-366. 
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found for the standard deviation of samples of 3 from a rectangu- 
lar population.! 

Studies of sampling experiments with a Pearsonian type ПІ 
population suggest that the distribution of sample variances from 
such a population may be adequately described by a Pearsonian 
type VI distribution. Studies have also been made of the 
moments of the sampling distribution of the variance with а view 
to obtaining Pearsonian сигуез to represent the sampling dis- 
tribution when the population itself is а Pearsonian type curve. 
By this analysis, light is shed on how the sampling distribution 
changes with changes in population type and changes in size of 
sample. 

ЇЇ vaca 

Exact Distributions of the Statistic t = VN CD. studies 
have been made of the distribution of sample s from a rectangu- 
lar distribution. For samples of 2 the distribution of t was 


fu) = ГЕШ» for samples of З it was much more complicated. 
These studies indicated that the distribution of ¢ was more 
peaked, i.e., had more cases near the middle and at the ends, 
when the population was rectangular than when it was normal.! 
It has also been found that the distribution of t from a U-shaped 
population was similar to that for a rectangular population.* 

In summarizing a recent account of sampling from nonnormal 
populations, Rietz concludes as f ollows: “ Although the prospects 
of obtaining the exact distribution functions of such statistics as 
the standard deviation s or the ‘Student’ ratio z for samples from 
à considerable variety of nonnormal populations do not seem 
promising, nevertheless, by the use of moments of moments, and 
experimental sampling, along with the exact determination of 


‘See Втети, Н. L., “Note on the Distribution of the Standard Deviation 
of Sets of Three Variates Drawn at Random from a Rectangular Popula- 
tion,” Biometrika, Vol. 23 (1931), pp. 424-426. : 

? Sce SoriisEn, “Discussion of Small Samples Drawn from an Infinite 
Skew Population,” Biometrika, Vol. 20A (1928), рр. 389-423. | 

? Lg Rovx, J. M., “А Study of the Distribution of the Variance in Small 
Samples,” Biometrika, Vol. 23 (1981), pp. 134—190. 


* Perro, V., “Оп the Distribution of Student’s Ratio for Samples of Three 


Drawn from a Rectangular Distribution,” Biometrika, Vol. 25 (1933), рр. 
203-204. 
$ See RIDER, op. cit. 
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some distribution functions, significant contributions are being 
made toward an understanding of the probable nature of certain 
important features of the distributions in question.” 1 

The Use of Normal Theory for Nonnormal Populations. In 
cases in which the population is nonnormal, but the exact forms 
of the various sampling distributions are not known, it is quite a 
common practice to view the results obtained on the assumption 
that the population is normal as fairly good approximations to 
those that would be actually obtained from the nonnormal popu- 
lation. The extent to which this practice is justified will now be 
considered. 

When the sample is large, as has been demonstrated above the 
distribution of sample means will tend to be normal in form, 
with a mean equal to the mean of the population and a variance 
equal to the variance of the population divided by N.* What- 


ever the nature of the population the moments of the distribu- 
tion of sample means are :2 


шн „сш‏ لد 
PEE. 3(N — 1 2‏ 
ям = № T № ) (из)? (1)‏ 
from which it follows that‏ 
= 8 
dh ad „шз 3 (2)‏ 


These Suggest a fairly rapid approach to normality in general, 
while experiments in sampling from nonnormal populations that 
have only a moderate degree of skewness and kurtosis—for 
example, when 8ı = .2 and 6, — 3 < -4—suggest that the sam- 
pling distribution of the mean approaches normality very 


rapidly. This appears to be true, for example, of samples as 


be made for a regression 
between two means, and other statistics 
mear functions of the sample data. It 
! Влета, Н. L., “Topies in Sampling 'Th 
Mathematical Society, Vol. 43 (1937), p. 230 
* See pp. 262-263. 
Ш: means ws for X, Хз Means ys for X, et 
Bi for X, ete. а 
1 Cf. CHURCH, op. cit, 


eory," Bulletin of the American 


Similarly, xg: means 
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is assumed in all these cases that the variance of the population 
is known. 


The Statistic t = VN б =); When the variance of the 


g 
population is not known, it will be recalled that fluctuations in 
the mean are analyzed through fluctuations in the statistic 


rex — = 
VN (X - 3) A number of studies have been made of the 


7 
sampling fluctuations in this statistic when the population is not 
normal! When the population was rectangular or U-shaped, it 
was found that the distribution of t failed to agree with the dis- 
tribution of ¢ for samples from normal populations, especially 
near the middle and at the ends of the distribution.? In general 
it has been found that the agreement between t for nonnormal 
samples and t for normal samples is poorest when the nonnormal 
populations are very peaked.? On the other hand, in the case of 
skew triangular populations it was found that, although the 
distribution of & was skewed, when the cumulative probability 
was considered ($.е., when the two tail areas were added) the 
results were about the same as for a normal population. This 
suggests that in the case of skew populations, at least, the use 
of the normal theory may not give bad approximations to the 
correct results. The danger of error apparently is greatest in 


very peaked populations. 

Analysis of Variance. Although the foregoing conclusions are 
based on experiments relating to sample means and regression 
coefficients, they apply equally well to analyses of variance in 
which n, = 1, the case when the sampling distribution of P, 
ог more precisely A/F,, reduces to the t distribution or half the 
t distribution because VF may have no sign. That such 
conclusions can be extended to other analyses of variance 
requires additional evidence. This has already been pro- 
рр. 446-447. 
rers, Е. W., “Small Samples— 
American Statistical Association, 


1 See references for section above, 

© Also see Sugwnamr, W. A., and Win 
New Experimental Results,” Journal of the 
Vol. 23 (1938), pp. 144-153. 

? Based on an extensive investiga 
Distribution of Frequency Constants in 
“metrical and Skew Populations,” Biom 

86. 


t See RIDER, op. cit. 


tion of Egon S. Pearson. See “The 
Small Samples from Nonnormal 
netrika, Vol. 21 (1929), pp. 259- 
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vided in part by other studies. From a study of eso pis 
data pertaining to a set of means grouped aecording to p 
principle of classification, Egon S. Pearson concludes that d. 
analysis based on the assumption of normality gives fairly i» т 
factory results even when the population is not normal. “This; ле 
points out, arises from the correlation that may exist betw ven 
the variation in the means and the variation about the means 
when the population is nonnormal. 

Variances. General equations have also been derived for the 
moments and £-coefficients of the sampling distribution of the 
variance. These also suggest that the sampling distribution of 
the variance tends toward the normal form as N increases. . The 
approach to normality is not very rapid, however, and it is not 
safe to assume that the distribution of the variance is normal 
unless the sample is quite large. Furthermore, experimental 
evidence indicates that for smaller samples from certain types of 
nonnormal populations the sampling distribution of the variance 
does not necessarily approximate the X^ distribution, which is 
the form it takes when the population is normal. 

Accordingly, it has been concluded by students of the subject 
that “we are liable to go far wrong when using the ‘normal 
theory’ variance distribution to represent. D(s?) [i.e., the sam- 
pling distribution of the variance] in samples from nonnormal 
parent populations. , . , "s ‘There is some doubt also as 
to the adequacy of the Г distribution for testing the ratio of 
two independent sample variances from nonnormal populations.* 


1 Pearson, EGON S., “ Analysis of Variance in Cases of Nonnormal Varia- 
tion,” Biometrika, Vol. 23 (1931), pp. 114-133, 


? These are conveniently summarized by LeRoux, 0p. cit. 
* Ibid., p. 189. The use of the moments of the sampling distribution of 
the variance to describe its general form is bas 


су curve. С. A, Baker has 
He has derived a general formula 


n of the Standard Deviations and Second 
Moments of Samples from a Gram-Charlier Population,” Annals of Mathe- 


matical Statistics, Vol. 2 (1931), pp. 48-65, Little use has yet been made 
of this approach to the problem. 


4 Cf. PEARSON, Biometrika, Vol. 23 (1931), pp. 114-133. 
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Correlation Coefficients. The distribution of the correlation 
coefficient of samples from nonnormal populations has been 
studied primarily by sampling experiments. When the popula- 
tion correlation coefficient is zero, these experiments suggest that 
the distribution of the correlation coefficient has approximately 
the same form as when the population is normal.! 

When the population correlation coefficient is not zero, on 
the other hand, and when the samples are small, the agreement 
between the experimental distributions derived from nonnormal 
populations and the theoretical distribution for a normal popula- 
tion does not appear to be so good, although the closeness of the 


agreement improves as the size of the sample increases. These 


conclusions are based on a very limited set of data and can at the 
most be considered as tentative only. 


INDIRECT ATTACKS ON THE PROBLEM OF NONNORMALITY 

Several indirect attacks on the problem of nonnormality have 
been attempted. One of these is the transformation of the 
original data. It has long been recognized, for example, that in 
certain cases the logarithms of given data are normally dis- 
tributed although the data themselves are not. In such cases, 


the problem of nonnormality may readily be solved by a logarith- 


mic transformation. A general transformation by which any 
approximately normal 


nonnormal distribution might be made 
has been proposed, and it has been suggested that the sampling 
distributions of estimates of the parameters of the nonnormal 
distributions might be expressed in terms of the transformation 
and the sampling distributions of estimates of the parameters 
of the normal distribution into which the nonnormal distribution 
has been transformed.? This line of attack, however, has not 
as yet been developed to a point where it can be put to practical 
use. 

Another line of attack has turned its attention to the qualita- 
tive or semiqualitative aspects of the data and by thus ignoring 


ance for the Correlation Coeffi- 


1 Pearson, Econ S., “The Test of Signifie 
Vol. 26 (1931), рр 


cient,” Journal оў the American Statistical Association, 
128-134. 

„> CHESHIRE, LEONE, ELENA 
Experiments on the Sampling Di 
Journal of the American Statistica. 

3 Cf. Annals of Mathematical Statist 


Orpis, and Econ S. Pearson, ‘Further 
stribution of the Correlation Coefficient,” 
1 Association Vol. 27 (1932), pp. 121-128. 
ics, Vol. 3 (1932), рр. 113-123. 
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some of the quantitative aspects has avoided the necessity of 
special assumptions as to the form of the population. Instead 
of determining, for example, whether two samples are from the 
same population by testing the difference between their means, 
variances, etc., it would be possible to group the data into classes 
and to apply a x? test of independence. A method involving no 
assumption as to population form would thus be substituted for 
one necessitating such an assumption. It is to be noted, how- 
ever, that this qualitative method is not as efficient as the 
quantitative method, since a certain amount of the information 
provided by the data is ignored. If the population appears to be 
normal, therefore, the more refined methods based upon normal 
assumptions are to be preferred. The qualitative method 
requires also that sufficient data be available to permit grouping 
to test for independence. 
For large samples a semiqualitative method is available for 
testing the existence of correlation. If in correlating X, and Xs, 
for example, only the rank or order of size of the variables is 
considered, it is possible to compute a coefficient of “rank 
correlation” whose significance may be tested without any 
assumption as to the form of the Х.Х, population. If (d; repre- 
sents the difference in rank between sample pairs of X, and Xs 
values, then the coefficient of rank correlation is measured by 


For & population rank correlation of zero this has a sampling 
distribution that is approximately normal in form. The mean 
of this sampling distribution is zero, and its Standard deviation 
is 6, = 1/ VN — 1. The approximation is valid only for large 
n of this procedure the student is 
telling and Pabst.: 
It may be noted, however, that the discarding of certain 
Мот again reduces the efficiency of the 
test, although it is contended that the loss of efficiency is not 
'very great—not more than, say, 9 per cent. It is also to be 
noted that this method merely affords a test of the existence of 
correlation. It does not offer a substitute for the sampling dis- 
1 "Rank Correlation and Tests of Si 


я gnificance Involvin No Assumption 
of Normality," Annals of Mathematica oti 


l Statistics, Vol. 7 (1936), pp. 29-43: 
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tribution of the Pearsonian correlation coefficient when it is to 
be assumed that the population correlation coefficient is other 
than zero. 

It has also been suggested that the principle of rank correlation 
may be used to solve the problem of nonnórmality in the analysis 
of variance.! This would involve the replacement of the actual 
value of a case by its rank in the conventional analysis of variance 
table. With such a plan, the efficiency of the test is again 
reduced in comparison with the ordinary analysis of variance 
applied to a sample from а normal population. This may be 
offset by the inaccuracy introduced when the usual test is applied 
to nonnormal populations. A great advantage of the rank 
method of handling analysis of variance is that it reduces greatly 
the amount of arithmetical computation involved in the test, 
since it converts all figures into simple small integers. 


TCHÉBYCHEF'S AND OTHER INEQUALITIES 
problem of nonnormality has been 
through the establishment of upper limits of probability for 
specified sampling fluctuations that are valid for almost any 
type of sampling dist ribution. ‘The best known of these attempts 
is Tchébychef’s inequality. "his says that the probability of a 
variable deviating from its mean by as much as Аб is equal to or 
less than 1/۸. For example, if the mean of a variable is 100 and 
its standard deviation is 10, the probability that the variable will 
deviate from 100 by as much as 36 is equal to or less than Ф = .11. 
The basis for this conclusion is as follows. By definition, 
ту = Be? tee FF (3) 
Let XY, X", +» and р, p^, В - .- represent those 
deviations and their probabilities that differ from X by as much 
as №. Then, since the part is less than or at the most equal to 


the whole, 


An interesting attack on the 


a= p(X’ = Ж)? + ox" = +... (4) 
or if (As)? is put in place of (X’ — ®)% (Х” = Ж)... which 
are equal to or greater than (№)°, 
to Avoid the Assumption of 


“The Use of Ranks 
Association, Vol. 32 ( 1937), 


1 FnrgpMAN, MILTON, nk: 
erican Statistical 


Normality,” Journal of the Am 
pp. 675-701. 
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$m р'(№8)? р” (№)? дз 


SE (p p) (5) 

ара i 
1 (6) 

Р» S55 
where Рм = p' + р” + . . . is the probability of a deviation 


equal to or greater than №.1 

The problem of closer inequalities has been dealt with in recent 
papers by several mathematicians, Camp, Guldberg, Meidel, and 
Narumi have succeeded particularly well by placing certain mild 
restrictions on the nature of the population function F(x). The 
restrictions are of such a nature as to leave the distribution 
function sufficiently general to be useful in the actual problems 
of statistics. The main restriction placed on F(x) by Camp is 
that it is to be a monotonic decreasing function of |x| when 
[| 2 с с> 0. The general effect of this restriction is to exclude 
distributions that are not represented by decreasing functions of 
[| at points more than a certain assigned distance from the 
origin. 


With the origin so chosen that zero is at the mean, Camp 
reaches the general inequality? 


Bs (7) 
where T 


g m 28 
Qu. = ER and Q--— E CTI 


! Cf. Rietz, HENRY Lewis, Mathematical Statistics (1936), pp. 28-30; 
140-144; and ARNE FISHER, The Mathematical Theory of Probabilities (1922): 
pp. 108-109, 115-116. The original work on the subject is in Jules Bien- 
aymé, “Considérations à 1’ 
probabilité dans la méthode des moindres carrés,” Comptes Rendus des’ 


c » Vol. 37 (1853), pp. 309-324; P. L. De 
Tehébychef, “Des Valeurs moyennes," Фин du Russe раг 


N. de Khanikof, Journal de mathématiques Pures et appliquées, Deuxióme 


Série, fol. 12 (1867), pp. 177-184; and P, Pizzetti, “I fondamenti matematici 
per la critica dei risultati sperimentali,” Atti della Università di Genova (1892), 
pp. 113-333. 


? RIETZ, op. cit., pp. 140-144, 


appui de la découverte de Laplace sur la loi de ' 
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and P,ois the probability of a deviation equal to or greater than св. 
If c is 0, this reduces to 


= 2 ES 
rye s Be (5n) Ы (8) 


The difficulty with the use of either Tchébychef’s formula or 
its extensions is that their use usually depends on knowledge of 
certain population parameters other than those for which hypoth- 
eses are being tested. For example, the variance of the 
sampling distribution of variances from any population equals 
approximately 


i (8: — 1) (9) 


From this it is evident that, to test à hypothesis regarding the 
population standard deviation, knowledge must be had of the 
population Вг. This greatly reduces the usefulness of inequalities 
of this kind, since only limited knowledge regarding Qs may be 
available. АП that can validly be done in cases like this is to 
test various joint hypotheses regarding the popylation variance 


and the population 62. 
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TABLE I.—FOUR-PLACE Common Locarrrams or NUMBERS! 


—- ——— 


Tenths of the 
bi 
ааа ив ||| elal aol See. 
12345 


| 


0.0000} 0043| 0086| 0128| 0170) 0212| 0253) 0294) 0334) 0374 0414 
0414| 0453| 0492| 0531| 0569| 0607| 0645| 0682| 0719) 0755) 0792 
0792| 0828| 0864] 0899) 0934| 0969| 1004| 1038| 1072| 1106 1139 
1139| 1173| 1206] 1239| 1271| 1303] 1335| 1367| 1399) 1430 1461 
1461| 1492| 1523| 1553| 1584| 1614| 1644| 1673) 1703 1732| 1761 


a 


1761| 1790| 1818| 1847| 1875| 1903, 1931 1959| 1987| 2014| 2041 
2041| 2068| 2095| 2122| 2148| 2175| 2201| 2227 2253| 2279| 2304 
2304| 2330| 2355| 2380| 2405| 2430| 2455| 2480 2504| 2529| 2553 
2553| 2577| 2601| 2625| 2648| 2672| 2095 2718| 2742| 2765| 2788| 
2788| 2810| 2833| 2856| 2878| 2900| 2923 2945| 2967| 2989| 3010| 


p 


Oho voyon “bio obNou oleo botan выБно owon noi r-o 


0.3010| 3032| 3054| 3075| 3096| 3118| 3139] 3160 3181| 3201| 3222| 
3222| 3243| 3263| 3284| 3304| 3324| 3345) 3365 3385| 3404 
3424| 3444| 3464| 3483| 3502| 3522| 3541| 3560 3579| 3598| 3617 
3617| 3636| 3655| 3674| 3692| 3711| 3729| 3747 3766| 3784| 3802 
3802| 3820| 3838| 3856| 3874| 3892) 3909 3927| 3945| 3962| 3979 


ee 


DONNY 
ARAARA BROMO MMMM OOOO ФФ OOM cooome 


ot rO OO 


0302030300. HHR RRR RON Сл Сл ССС OOO NNOO 


3979 4014| 4031| 4043] 4065| 4082| 4099| 4116) 4133 4150 
4150 2100 4982 4200 4216| 4232| 4249| 4265| 4281| 4208) 4314 
4314| 4330| 4346| 4362| 4378| 4393| 4409| 4425| 4440 4450) 4472 
4472| 4487| 4502| 4518| 4533| 4548| 4564| 4579 4594| 4609| 4024 
4024| 4039| 4654| 4669| 4683| 4698] 4713| 4728, 4742 4757| 4771 


4| 4829| 4843| 4857| 4871 4886| 4900) 4914 
res 4969| 4983| 4997| 5011 5024| 5038| 5051 
5119| 5132| 5145| 5159| 5172 5185 
5263| 5276| 5289| 5302| 5315 
5403| 5416| 5428| 5441 


H to bo toto 


RWW WWW 0200000200. ЧА У 


0.4771| 4786| 4800) 
4914| 4928| 4942| 4955 
5051| 5065| 5079| 5092| 5105 
5185| 5198| 5211| 5224| 5237| 5250 
5315| 5328| 5340| 5353| 5366| 5378 5391 


нынын 


5441| 5453| 5465| 5478| 5490) 5502 5514) 5527 5539) 5551 5563) 
5563 5575 5587| 5599| 5611| 5623| 5635 5647 5658 5670 pon 
5082| 5694| 5705| 5717| 5729| 5740 5752| 5763) 5775| 5786 2788 
5798| 5809| 5821| 5832| 5843) 5855 5866| 5877| 5888| 5899 ШШ 
5911| 5922| 5933| 5944| 5955| 5966 5977| 5988| 5999| 6010| 6021 


117| 6128 
6064| 6075| 6085 6096| 6107 6117 


0.6021 1| 6042| 6053 

6128 А НИ: 6160] 6170| 6180 6191| 6201| 6212 6222 0282 
0232| 6243| 6253] 6263] 6274 6284| 6294| 6304 6314 6825 EUM 
6335| 6345| 6355| 6365| 6375| 6385 6395| 6405| 6415 Gaay ped 
6435| 6444| 6454| 6464| 6474| 6484 6493| 6503| 6513| 6522 


5 9| 6618| 662 
6532| 6542| 6551| 6561| 6571) 6580 6590| 6599| 660 

6628| 6637| 6646| 6656| 6665| 6675 6684| 6693 6702 ps 
6721| 6730| 6739| 6749| 6758| 6767| 6776 6785| 6794| 6803 sa 
6812| 6821| 6830| 6839| 6848| 6857| 6866 6875} 6884 Eu 6002 
6902| 6911| 6920] 6928) 6937| 6946 6955| 6964| 6972| 698. 


ررر 


bry BOBO I O O 
оо weww WWW WHO RRR e Сл Сл OO 


o| 7050| 7059] 7067| 7076 
0.6990| 6998| 7007| 7016| 7024| 7033 704: 7050) 7053| 7152, 7160 


7076| 7084| 7093| 7101| 7110| 7118| 7126 

7160| 7168| 7177| 7185| 7193| 7202 7210| 7218 wee 1280 a 
7243| 7251| 7259| 7267| 7275) 7284 7292 7300 т 1396 7404 
7324| 7332| 7340| 7348| 7356] 7364 7372| 7380| 7: 


o 
© 
2 

ты‏ پیت یری 


tO pwr 


HORAN EPRI RR Фоме Won RON) NNR. 


Huntington's Four Place Tables of Logarithms and 


1 Take ith ission, from E. V. 
n, with permission, fro: бу, Ine. 1907). 


Trigonometric Functions (Harvard Cooperative So 


457 


460 SAMPLING STATISTICS AND APPLICATIONS 


TABLE L—Fovun-PLACE COMMON Говльггимз or NuwBERS.— 
(Continued) 


.0.1761 | 1764 | 1767 
1790 | 1793 | 1796 
1818 | 1821 | 1824 
1847 | 1850 | 1853 
1881 


1903 | 1906 | 1909 
1931 | 1934 | 1937 
1959 | 1962 | 1965 
1987 | 1989 | 1992 
2014 | 2017 | 2019 


Ыыыы нынын 
ы т ттт 
SER #8575 
= 
© 
ES 
E 
m 
© 
я 
© 


ое 


0.2041 | 2044 | 2047 
2068 | 2071 | 2074 
2095 | 2098 | 2101 

2127 

2148 | 2151 | 2154 


BES SO 


esses 
w 
P 
to 
[3 
№ 
to 
© 


2175 | 2177 | 2180 
2201 | 2204 | 2206 
2232 
2253 | 2256 | 2258 
2279 | 2281 | 2284 


рене неее 


Na Nuit 22222 


очен PONR 
to 
= 
5 
e 
to 
x 
© 
© 


ЕЕ ЕЕ 
M 
ГУ 
9 
Я 
w 
3 
© 
© 


0.2304 | 2307 | 2310 
2330 | 2333 | 2335 
2355 | 2358 | 2360 
2380 | 2383 | 2385 

2410 


2430 | 2433 | 2435 
2455 | 2458 | 2400 
2480 | 2482 | 2485 
2504 | 2507 | 2509 
2529 | 2531 | 2533 


ынны 


нынын 


0.2553 | 2555 | 2553 
2577 | 2579 | 2582 
2601 | 2603 | 2605 
2625 | 2627 | 2629 
2648 | 2651 | 2653 


pr 


2076 
2695 | 2697 | 2700 
2718 | 2721 | 2723 
2742 | 2744 | 2746 
2765 | 2767 | 2769 


нынын » 
SELER FREESE 

M 

Я 

to 

" 

8 

а 

я 


0.2788 | 2790 | 2792 
2810 | 2813 | 2815 
2838 
2856 | 2858 | 2860 
2878 | 2880 | 2882 


— 
ooo 
БЕ ЕН 

B 

e 

E: 

e 


APPENDIX 


TABLE IL—SqvanEs or NUMBERS 


461 


N 0 1 2 3 4 5 6 
100| 10000| 10009] 10516 11236 
110! 12100 12769] 12996 13456 
120| 14400| 15129| 15376) 15876 
130 | 16900| 17689| 17936 18496 
140 | 19600 20449| 20736 21316 
150 | 22500 23409| 23716 24336 
160 | 25600 26569| 26896 27556 
170 | 28900 20020] 30276 30976 
180 | 32400 33489] 33856 34596 
190 | 36100 37249| 37636 38416 
200 | 40000] 40401| 40804] 41209| 41010 42436 
210| 44100] 44521| 44944| 45309| 45796 46656 
220 | 48400] 4ss41 49729| 50176 51076 
230| 52900] 53361 54280] 54756| 55225) 55690 
240| 57000] 58081 59049| 59530] 00025] 00516 
250 | 02500] 63001| 03504] 64009] 64516 65536 
260| 07600] 68121 68644| 69169| 69696 70756 
270| 72000] 73441] 73084] 74520] 75070 76176 
280 | 78400 4| 80089] 80656 81796 
290| 84100 85849| 86436 87616 
300 | 90000| 90601| 91204] 91809] 92416 93636 
210 90100] 96721) 97344] 07000] 98596 99856 
m 102400] 103041| 103684| 104329| 10497 106276 
O| 108000] 109561| 110224| 110880] 111556 112896 
340 | 115000] 116281| 110904| 117649] 118336 119716 
350| 122500| 103201 123904| 124609] 125310| 126025] 126730 
300 | 129600| 130321| 131044| 131700] 132496] 133225| 133950 
EG 130900] 137641| 138384| 139129| 139876] 140625) 141370 
80 | 144400| 145161| 145924| 140689] 147450| 148225] 148996 
390 | 152100] 152881 153004| 154449 | 155230| 150025] 150810 
ES 160000] 160801} 161604| 162409] 163216] 104025| 104836 
10 | 168100] 168021| 160744| 170509 171396] 172225| 173056 
0 176400] 177241] 178084] 178929] 170776] 180025) 181476 
2 184900] 185761] 186624] 187489] 188356] 189225] 190096 
O| 193600] 194481| 195364] 196249| 197136] 198025] 198916 
E 202500] 203401| 204304| 205209| 200116] 207025) 207936 
ho 211600| 212521| 213444| 214369] 215296] 216225| 217150 
480 | 220900) 221841| 222784| 223729] 224676] 225625) 226576 
bus 280100] 231361| 232324| 233289| 234250| 235225] 236196 
0 | 240100] 241081| 242064| 243049] 244036] 245025| 240010 
090 250000] 251001! 252004| 253009] 254016] 255025] 256036 
im 260100 261121| 262144] 263169| 264196] 265225] 266256 
i 270400| 271441| 272484| 273529] 274576] 275025| 276076 
E. 280900) 281961| 283024| 284089] 285150| 286225] 287296 
O| 291600] 202681 293704 204849] 295936] 207025] 208116 


11449 
13689 
16129 
18769 
21609 


24649 
27889 
31329 
34969 
38809 


42849 
47089 
51529 
56169 
61009 


66049 
71289 
76729 
$2369 
88209 


94249 
100489) 
106929 
113569 
120409 


127449 
134689 
142129 
149769 
157609 


165649 
173889 
182329 
190969 
199809 


208849 
218089 
227529 
237169 
247009 


257049 
267289 


277729 
288369 
299209 


110604 
13924| 
16384 
19044| 
21904| 


24964 
28224 
31684 
35344 
39204| 


43264 
47524 
51984 
56644 
61504 


66564 
71824 
77284 
82944| 
88504 


94864 
101124 
107584 
114244 
121104 


128164 
135424 
142884 
150544 
158404 


166464 
174724 
183184 
191844 
200704 


209764 
219024 
228484 
238144 
248004 


258064: 
268324 
278784 
289444) 


300304. 


11881 
14161 
16641 
19321 


22201 


25281 
28561 
32041 
35721 
39601 


43081 
47961 
52441 
57121 
62001 


67081 
72361 
T7841 
83521 
89401 


95481 
101761 
108241 
114921 
121801 


128881 
136161 
143641 
151321 
159201 


107281 
175561 
184041 
192721 
201610 


210681 
219961 
229441 
239121 
249001 


259081 
269361 
279841 
290521 
301401 


И 


1 
Us Улсан, Атвент E., Laboratory Manual and Problems for Elements of Statistical 
hod (McGraw-Hill Book Company, Inc., 1944). 


462 SAMPLING STATISTICS AND APPLICATIONS 


TABLE II.—Squares or N UMBERS.—(Continued) 


N 0 1 2 3 4 5 6 7 8 9 


550 | 302500] 303601| 304704| 305809| 306916] 308025 309136| 310249| 311304| 312481 
560 | 313600| 314721| 315844| 316969 318096| 319225| 320356] 321489 322624] 323761 
570 | 324900| 326041; 327184| 328329| 329476] 330625 331776| 332929| 334084| 335241 
580 | 336400] 337561] 338724| 339889] 341056 342225] 343396] 344560] 345744| 346921 
590 | 348100] 349281] 350464 351649] 352836 854025] 355216] 356409] 357604] 358801 


600 | 360000) 361201) 362404! 363609] 364816] 366025] 367236] 368449| 369664| 370881 
610 | 372100: 373321 374544 375769| 370990| 378225| 379456| 380689| 381924| 383101 
620 | 384400 385041| 380884| 388129| 380376] 390625| 301876] 3031591 304384| 305641 
630 | 390900: 308101 | 399424] 400080) 401956] 403225 404400] 405709] 407044, 408321 
640 | 409600| 410881| 412164] 413449] 414736] 410025) 417310 418000] 419904| 421201 


650 | 422500] 423801) 425104] 426409] 427716 429025| 430336] 431649] 432964| 434281 
000 | 435600] 436921| 438244] 4309569] 440896] 442225 443556] 444889] 446224| 447561 
670 | 448000) 450241 451584| 452929| 454276 455625) 456976| 458329] 459084| 461041 
680 | 462400| 463761] 465124] 466489| 467856 469225] 470590| 471969| 473344| 474721 
690 | 476100! 477481) 478804| 480249| 481636 483025| 484410| 485809| 487204| 488001 


700 | 490000! 401401 492804] 494909| 495616] 497025 498436] 498849| 501204) 502681 
710) 504100| 505521! 506044| 50839] 500796 511225| 512050| 514089] 515524| 510961 
720 | 518400| 519841 521984! 522729| 524176 525625] 527076| 528529| 529984| 531441 
730 | 532900] 534361| 535824| 537289| 538750 540225| 541696] 543109| 544044] 540121 
740 | 547000] 549081| 550564| 552049 553536] 555025) 556516] 558009| 559504| 561001 


750| 502500| 564001 565504] 567009] 568516 570025) 571536] 573049] 574564| 576081 
760 | 577600] 579121 580644) 582169] 583696 585225) 586756] 588289] 589824] 501361 
770 | 592000] 594441 595984] 597529] 599076 600625] 602176] 603729] 605284| 606841 
780 | 608400] 609961 611524| 613089] 614656 616225] 617790| 610369] 620944] 022521 
790 | 624100] 625681 627264) 628849] 630436 032025) 633616] 635209] 636804| 638401 


800 | 640000] 041601 643204] 644800] 646416 648025] 049036] 651249] 652864] 054481 
820 020100) 657721 659344] 660069] 662596] 004225 005856] 067480] 660124] 670761 
= BEES. 074041 675084) 677329] 678076] 680025) 682270 683929] 685584| 687241 
be 090501 692224) 603889] 695556] 607225| 608800 700569] 702244 703921 

05000| 707281! 708064] 710649] 712336] 714028 715710) 717409] 719104] 720801 


Soo | 722500) 724201 725004] 727609) 729816 731025] 732736] 734449] 736164] 737881 

70600) 741321 743044! 744769] 740490, 748225 749956| 751689| 753424| 755101 
870 | 756900! 758641 760384 762129] 763876 765625] 767376 769129] 770884 772041 
880 | 774400] 770161 777924| 779689] 781450 783225] 784006] 7S0769| 788544] 790321 
890 | 792100] 793881 795064| 797449 799236] 801025] 802816] 804609) 806404 808201 


900 | 810000] 811801| 813604| 315409] 317216 Я 2 26281 
910 | 828100] 820021| 831744] 333569] 835306 = PURA tomm вати 544561 
920 846400) 848241 850084 851929| 853776 855625| 857476] 859329| 861184] 803041 
930 | 864900 866761] 868624| s70489| 872356 874225] 876096] 877069| 879844| 881721 
940 | 883600] 885481) 887364] 889249] 891136 893025) 894916] 896809] 808704| 900001 


950 | 902500) 904401 906304! 908209| 910116 
960 | 921600} 923521] 925444] 927369| 929296 
970 | 940900] 942841 944784! 946729] 948676 
980 | 960400} 962361] 964324] 006289 968256] 970220 972 8121 
2 225| 972196| 974169| 976144| 97! 
990) 980100) 982081) 984064] 986049| 988036] 900025 992016] 994000] 996004] 99800! 


e——— A] Ns ено 


912025] 913936] 915849| 917764 019081 
981225) 933150] 935089] 987024] 938961 
950625] 952576] 954520] 956484, 95844 
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463 


Taste HlI.—SqvuaAnE Roors or Хомвевз FROM 10 то 1001 


س 


€ 
10 | 3.162 | 3.178 | 3.194 
11 | 3.317 | 3.332 | 3.347 
12 | 3.464 | 3.479 | 3.493 
13 | 3.606 | 3.619 | 3.633 
14 | 3.742 | 3.755 | 3.768 
15 | 3.873 | 3.386 | 3.899 
16 | 4.000 | 4.012 | 4.025 
И | 4.123 | 4.135 | 4.147 
18 | 4.243 | 4.254 | 4.266 
19 | 4.359 | 4.370 | 4.382 


648 
P 6.708 | 6.716 | 6.723 
а? 6.782 | 6.790 | 6.797 
ay | 9-856 | 6.863 | 6.870 
49 | 9-928 | 6.035 | 6.943 
7.000 | 7.007 | 7.014 
n 

2 7.071 | 7.078 | 7.085 
© 7.141 | 7.148 | 7.155 
z 7.211 | 7.218 | 7.225 
3 | 7.280 | 7.287 | 7.294 
54 | 7.348 | 7.355 | 7.362 


1 Source: Wavan, Атовят E., Laboratory Manual and 


دع со‏ دع фо‏ ډو 


د ge‏ لر حر p‏ 


Metho (McGraw-Hill Book Company, Inc., 1944)- 


0.9 


.286 | 3.302 
3.435 | 3.450 
3.578 | 3.592 
3.715 | 3.728 
3.847 | 3.860 
3.975 | 3.987 
4.099 | 4.111 
4.219 | 4.231 
4.330 | 4.347 
4.450 | 4.461 

61 | 4.572 
69 | 4.680 
75 | 4.785 

4.889 

-980 | 4.990 
5.079 | 5.089 
5.177 | 5.187 
5.973 | 5.282 
5.367 | 5.376 
5.459 | 5.408 


o o 
э = оом ё 
2 x воная 
[3 [d © вече 
8 я а 3 
E: 8 E Ei 


hoe to H H 
585656 
coco 
ټه ډه ټه به به‎ 

N 

S 

e 


Problems for Elements of Statistical 


464 SAMPLING STATISTICS AND APPLICATIONS 


Тавин ПТ. Заслве Roors or Хомвенѕ From 10 то 100.—(Continued) 


N 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 


55 | 7.416 | 7.423 | 7.430 | 7.436 | 7.443 | 7.450 | 7.457 | 7.463 | 7.470 | 7.477 
56 | 7.483 | 7.490 | 7.497 | 7.503 | 7.510 | 7.517 | 7.523 | 7.530 | 7.537 | 7.543 
57 | 7.550 | 7.550 | 7.563 | 7.570 | 7.576 | 7.582 | 7.589 | 7.596 | 7.603 | 7.609 
58 | 7.616 | 7.622 | 7.629 | 7.635 | 7.642 | 7.649 | 7.655 | 7.662 | 7.668 | 7.675 
59 | 7.681 | 7.688 | 7.694 | 7.701 | 7.707 | 7.714 | 7.720 | 7.727 | 7.733 | 7.740 
60 | 7.746 | 7.752 | 7.759 | 7.765 | 7.772 | 7.778 2.785 7.791 | 7.797 | 7.804 
61 | 7.810 | 7.817 | 7.823 | 7.829 | 7.836 | 7.812 | 7.349 | 7.355 | 7.361 | 7.808 
62 | 7.874 | 7.880 | 7.887 | 7.893 | 7.899 | 7.906 | 7.912 | 7.918 | 7.925 | 7.931 
63 | 7.937 | 7.944 | 7.950 | 7.956 | 7.962 | 7.969 | 7.975 | 7.981 | 7.987 | 7-994 
64 | 8.000 | 8.006 | 8.012 | 8.019 | 8.025 | 8.031 | 8.037 | 8.044 | 8.050 | 8.056 


65 

66 

67 à 

68 8. 

69 8. 

70 | 8.307 | 8.373 | 8.379 | 8.385 | 8.300 | 3.306 | 8.402 ‚414 | 8.420 

71 | 8.426 | 8.492 | 8.438 | 8-444 | 8.450 | 8.456 | 8.402 | 8-40в | 8.478 | 8.479 

72 | 8.485 | 8.491 | 8.497 | 8.503 | 8.509 | 8.515 | 8.521 | 3.526 | 8.532 | 8.538 

73 | 8.544 | 8.550 | 8.556 | 8.562 | 8.567 | 8.573 | 8.570 | 8.585 |8 501 | 8.507 

74 | 8.002 | 8.608 | 8.014 | 8.620 | 8.626 | 8.031 | 8.637 | 8.643 | 8.049 | 8.654 

5 : i 8.666 8.672 | 8.678 | 8.683 | 8.080 | 8.695 | 8.701 | 8.706 | 8.712 

a7 | втв 15-024 | 3-730 | 8-735 | 8.741 | 8.746 | 8.752 | 8.758 | 8.764 | 8.769 

78 | 8.832 | 5.751 | 8.760 | 8-792 | 8.798 | 8.803 | 8.809 | 8.815 | 8.820 | 8.820 

sums 8.843 | 8.849 | 8.854 | 8.300 | 3.566 | 8.571 | 8.877 | 8.883 
8.894 | 8.899 | 8.905 | 8.911 | 8.916 | 8.922 | 8.027 | 8.033 | 8-939 

E g Е Hen 8.955 | 8.961 | 8.967 | 8.072 | s.o7s | s.os3 | 8.080 | 8.994 

"зз | 9.055 | эон | 2:011 | 9-017 | 9.022 | 9.028 | 9.033 | 9.029 | 9.044 | 9.050 

sx | oso 5087 p es s 9.077 | 9.083 | 9.088 | 9.094 | 9.099 9.108 

84 | 9.165 | 9.171 | 9.176 | 9.182 "e 9.102 ped e М 200 | 9:2 


88 1 Eod ER 9.236 | 9.241 | 9.247 | 9.252 | 9.257 | 9.263 | 9.268 
87 | 9.327 | 9.333 ais 9.290 | 9.295 | 9.301 | 9.306 | 9.311 | 9.317 | 9.322 
88 | 9.381 | 9.380 0:225 | 9-343 | 9-349 | 9.354 | 9.359 | 9.865 | 9.370 | 9.376 
80 | 9.434 | 9:429 [2:201 | 9-397 | 9-402 | 9.407 | 9.413 | 9.418 | 9.423 | 9-429 
Ў 9.445 | 9.450 | 9.455 | 9.460 | 9:466 | 9.471 | 9.403 | 9.482 

90 | 9.487 | 9.492 | 9.497 18 | 9.524 | 9.529 | 9.534 
8 71 | 9.576 | 9.581 | 9.590 

23 | 9.628 | 9.633 | 9.038 

75 | 9.680 | 9.685 | 9.690 

.726 | 9.731 | 9.737 | 9:742 


oobo0o0o0 оороо 
OOo ooo 
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Taste IV.—SqvanE Roors or NUMBERS FROM 100 то 1000! 


N| 0 1 2 3 4 5 6 7 8 9 
100 | 10.00 | 10.05 | 10.10 | 10.15 | 10.20 | 10.25 | 10.30 | 10.34 | 10.39 | 10.44 
110| 10.49 | 10.54 | 10.58 | 10.03 | 10.08 | 10.72 | 10.77 | 10.82 | 10.86 | 10.91 
120 10.95 | 11.00 | 11.05 | 11.09 | 11.14 | 11.18 | 11.22 | 11.27 | 11.31 | 11.36 
180 | 11.40 | 11.45 | 11.49 | 11.53 | 11.58 | 11.62 | 11.66 | 11.70 | 11.75 | 11.79 
140| 11.83 | 11.87 | 11.92 | 11.96 | 12.00 | 12.04 | 12.08 | 12.12 | 12.17 | 12.21 
150 | 12.25 | 12.29 | 12.33 | 12.37 | 12.41 | 12.45 | 12.49 | 12.53 | 12.57 | 12.61 
160 | 12.65 | 12.69 | 12.73 | 12.77 | 12.81 | 12.85 | 12.88 | 12.92 | 12.96 | 13.00 
170| 13.04 | 13.08 | 13.11 | 13.15 | 13.19 | 13.22 | 13.27 | 13.30 | 13.34 | 13.38 
180 | 13.42 | 13.45 | 13.49 | 13.53 | 13.56 | 13.00 | 13.04 | 13.67 | 13.71 | 13.75 
190| 13.78 | 13.82 | 13.36 | 13.89 | 13.03 | 13.96 | 14.00 | 14.04 | 14.07 | 14.11 
200| 14.14 | 14.18 | 14.21 | 14.25 | 14.28 | 14.32 | 14.35 | 14.39 | 14.42 | 14.46 
210| 14.49 | 14.53 | 14.56 | 14.59 | 14.63 | 14.66 | 14.70 | 14.73 | 14.76 | 14.80 
220| 14.83 | 14.87 | 11.00 | 14.93 | 14.07 | 15.00 | 15.03 | 15.07 | 15.10 | 15.13 
230| 15.17 | 15.20 | 15.23 | 15.26 | 15.30 | 15.33 | 15.36 | 15.39 | 15.43 | 15.46 
240 | 15.49 | 15.52 | 15.56 | 15.59 | 15.02 | 15.65 | 15.08 | 15.72 | 15.75 | 15.78 
250| 15.81 | 15.34 | 15.87 | 15.91 | 15.94 | 15.97 | 16.00 | 16.03 | 16.06 | 16.09 
260 | 16.12 | 16.16 | 10.19 | 16.22 | 16.25 | 16.28 | 16.31 | 16.34 | 16.37 | 16.40 
270| 16.43 | 16.46 | 10.49 | 16.52 | 16.55 | 10.58 | 16.61 | 16.64 | 16.67 | 16.70 
280| 16.73 | 16.76 | 16.79 | 16.82 | 16.85 | 16.88 | 16.91 | 16.94 | 10.97 | 17.00 
290 | 17.03 | 17.06 | 17.09 | 17.12 | 17.15 | 17.18 | 17.20 | 17.23 | 17.26 | 17.29 
00 17.32 | 17.35 | 17.38 | 17.41 | 17.44 | 17.40 | 17.49 | 17.52 | 17.55 | 17.58 
mm 17.61 | 17.04 | 17.66 | 17.69 | 17.72 | 17.75 | 17.78 | 17.80 | 17.83 | 17.86 
A 17.89 | 17.92 | 17.94 | 17.97 | 18.00 | 18.03 | 18.06 | 18.08 | 18.11 | 18.14 
з101 28-17 | 18.19 | 18.22 | 18.25 | 18.28 | 18.30 | 18.33 | 18.36 | 18.38 | 18.41 
0| 18.44 | 18.47 | 18.49 | 18.52 | 18.55 | 18.57 | 13.60 | 18.63 | 18.65 | 18.68 
"o 18.71 | 18.74 | 18.76 | 18.79 | 18.81 | 18.84 | 18.87 | 18.89 | 18.92 | 18.95 
зло | 18-97 | 10.00 | 19.03 | 19.05 | 19.08 | 19.10 | 19.13 | 19.16 | 19.18 | 19.21 
380 19.24 19.29 | 19.31 | 19.34 | 19.36 | 19.39 | 19.42 | 19.44 | 19.47 
390 | 19-49 19.54 | 19.57 | 19.60 | 19.62 | 19.65 | 19.67 | 19.70 | 19.72 
19.75 19.80 | 19.82 | 19.85 | 19.87 | 19.90 | 19.92 | 19.95 | 19.98 
2) 20.00 20.0 .12 | 20.13 | 20.17 | 20.20 | 20.22 
din | 20-25 20.30 -37 | 20.40 | 20.42 | 20.44 | 20.47 
280 20.49 20.54 0.62 | 20.64 | 20.66 | 20.69 | 20.71 
ado 20.74 20.78: 0.86 | 20.88 | 20.90 | 20.93 | 20.95 
20.98 21.0: 1.10 | 21.12 | 21.14 | 21.17 | 21.19 
21.42 
21.66 
21.89 
22.11 
22.34 
22.36 | 22.38 | 22. 2.47 22.56 
22.58 | 22.01 | 22. 2.69 22.78 
22.80 | 22.83 25. 2.91 23.00 
23.02 | 23.04 | 23. 3.13 23.22 
23.24 | 23.26 | 23. 3.35 23.43 
23.45 | 23,47 28. 3.56 23.64 


1 Source; W. 


NOR (Мей. Avon, ALBERT E., Laboratory Manual and Problems for Elements of Statistical 


raw-Hill Book Company, Ine., 1944). 
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TABLE IV.—SqvanE Roots or Numpers FROM 100 то 1000.— (Continued) 


= torem 
N 0 1 2 3 4 5 6 7 8 9 
23.54 23.64 
23.75 23.85 
23.96 24.06 
24.17 24.27 
24.37 24.47 
4.58 | 24. .62 | 24.64 | 24.66 | 24.68 
4.78 | 24. 82 | 24.84 | 24.80 | 24.88 
4.98 | 25. 02 | 25.04 | 25.06 | 25.08 
5.18 | 25. .22 | 25.24 | 25.26 | 25.28 
5.38 | 25 42 | 25.44 | 25.46 | 25-48 


DNDN NY 
© © сз Ot O 


KREIS 
= 
3 
5 
© 
4 
© 
№ 
e 
© 
= 
t$ 
etd: 
о 
e 
№ 
e 
> 
[s 
ю 
© 
о 
© 
to 
e 
о 
© 
t 
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800 | 28.28 | 28.30 | 28.32 | 28.34 | 28.35 | 28.37 | 28.30 | 28.41 | 28.43 | 28-44 
$20 E " 28-48 28.50 | 28.51 | 28.53 | 28.55 | 28.57 em 28 во | 28.62 
830 | 28.81 25.05 28.67 | 28.69 | 28.71 | 28.72 | 28.74 | 28.76 | 28.78 | 28-79 
"s: 28.83 28.81 | 28.86 | 28.88 | 28.00 | 28.01 | 28.93 | 28.95 | 28-97 
98 | 20.00 | 29.02 | 29.03 | 29.05 | 29.07 | 29.09 | 29.10 | 29.12 | 29-14 

EA a 1 ae 29.19 29.21 | 29.22 | 29.24 | 29.26 | 29.27 | 29.29 | 20-31 
Ea | 4 | 29.36 | 29.38 | 29.39 | 29.41 | 29.43 | 29.44 | 29.46 20.48 
PES ER 29.53 | 20.55 | 20.56 | 20.58 | 29.60 | 29.61 | 29.63 | 29-55 
ыы E +68 | 29.70 | 29.72 | 29.73 | 29.75 | 29.77 | 29.78 | 20.80 | 29-82 
29.85 | 29.87 | 29.88 | 29.90 | 29.92 | 29.93 | 29.95 | 29.97 | 29-98 


APPENDIX 


TABLE V.—Recrprocats OF NUMBERS! 


1.00}1.0000 
1.10] .9091 
1.20] .8333 
1.30] .7692 
1.40] .7143 


1.50] .6607 
1.00] .6250 
1.70] .5882 
1.80] .5556 
1.90] .5263 


2.00] .5000 
2.10] .4702 
2.20] .4545 
2.30] 4348 
2.40] .4167 


2.50! .4000 
2.60] .3846 
2.70] .3704 
2.80) „3571 
2.00] .3448 


3.00) .3338 
3.10] .3226 
3.20] .3125 
3-30] .3030 
3.40) 2941 


3-50] .2857 
3:99 .2778 
-70| .2703 
3.80) 9632 
3.90] 2564 


FRA +2500 
"10| .2439 
+2381 
+2326 
4.40) .2273 


-2222 
-2174 
-2128 
-2083 
-2041 


-2000 
-1961 
.1923 
-1887 
-1852 


gamma pp ppp 
58858 $8885 


.9901 
-9009 
.8264 
.7634 
.7092 


.6623 
.6211 
.5848 
.5525 
.5236 


.4975 
.4739 
.4525 
.4329 
.4149 


.3984 
.3831 
.3690 
„3559 
.3436 


.3322 
+8215 
3115 
.3021 
+2933 


+2849 
+2770 
+2695 
+2625 
+2558 


.2494 
.2433 
.2375 
.2320 
.2268 


.2217 
.2169 
.2123 
.2079 
.2037 


.1996 
.1957 
.1919 
‚1883 
.1848 


.02 


. 9804 
-8929 
.8197 
«7576 
.7042 


.6579 
.6173 
.5814 
.5495 
.5208 


.4950 
.4717 
.4504 
.4310 
.4132 


.3908 
.3817 
.3676 
.3546 
.3425 


.3311 
.3205 
.3106 
.3012 
.2924 


.2841 
+2762 
+2688 
+2618 
«2551 


+2488 
‚2427 
.2370 
.2315 
.2262 


.2212 
.2104 
.2119 
.2075 
.2033 


.1992 
.1953 
.1916 
.1880 
.1845 


.03 


.04 


.9615 
.8772 
. 8065 
. 7403 
.6944 


.6494 
.6098 
.5747 
.5435 
.5155 


.4902 
.4073 
.4404 
.4274 
.4098 


.3937 
.3788 
.3650 
.3521 
.3401 


.3289 
.3185 
.3086 
.2994 
.2907 


.2825 
.2747 
.2674 
.2604 
.2538 


.2475 
.2415 
.2358 
.2304 


.2252 


.2203 
.2155 
.2110 
.2066 
.2024 


.1984 
.1946 
.1908 
.1873 
.1838 


167 


.05 .06 .07 .08 .09 

.9524 | .9434 | .9346 | .9259 | .9174 
8696 | .8621 | .8547 | .8475 | .8403 
8000 | .7937 | .7874 | .7812 7752 
7407 7353 | .7299 | .7246 | .7194 
.6897 | .6849 | .6803 | .6757 | .6711 
.6452 | .6410 | .6369 | .6329 6289 
.6061 | .6024 | .5988 | .5952 5917 
.5714 5682 | .5650 | .5618 | .5587 
.5405 5376 | .5348 | .5319 5291 
.5128 | .5102 | .5076 | .5051 5025 
.4831 | .4808 | .4785 

.4008 | .4587 4566 

.4405 | .4386 4307 

.4219 | .4202 4184 

.4049 | .4032 4016 

.3891 | .3876 3861 

.3745 | .3731 | .3717 

.3610 | .3597 | .3584 

.3484 | .3472 3460 

.3367 | .3356 | .3344 

3279 | .3268 | .3257 | .3247 | .3236 
3175 | .3165 | .3155 | .3145 3135 
3077 | .3007 | .3058 | .3049 3040 
.2985 | .2976 | .2967 | .2959 2950 
.2899 | .2890 | .2882 | .2874 | .2865 
.2817 | .2809 | .2801 | .2793 | .2786 
2740 | .2732 | .2725 | .2717 2710 
2007 2660 | .2653 | .2646 2639 
.2597 | .2591 | .2584 | .2577 2571 
.2532 | .2525 | .2519 | .2513 2506 
2463 | .2457 | .2451 2445 

2404 | .2398 | .2392 2387 

2347 | .2342 | .2336 2331 

2294 | .2288 | .2283 2278 

2242 | .2237 | .2232 2227 

2198 | .2193 | .2188 | .2183 | .2179 
2151 2146 | .2141 | .2137 2132 
2105 2101 | .2096 | .2092 2088 
2062 2058 | .2053 | .2049 2045 
2020 | .2016 | .2012 | .2008 | .2004 
.1980 | .1976 | .1972 | .1968 | .1965 
1942 193; .1934 | .1930 | .1927 
1905 1901 | .1898 | .1894 1890 
1869 | .1866 | .1862 | .1859 1855 
1835 1832 | .1828 | .1825 1821 


‘Source: WAvar, Arment E., Laboratory Manual and Problems for Elements of Statistical 
“hod (McGraw-Hill Book Company, Ine., 1944). 
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TABLE V.—REcIPROCALS OF Nounpers.—(Continued) 


.50] .1818 | .1815 | .1812 | .1808 | .1805 
.60| .1786 | .1783 | .1779 | .1776 | .177; 

+1751 | .1748 | .1745 | .1742 
E .1724 | .1721 | .1718 | .1715 | .1712 
.90] .1695 | .1692 | .1089 | .1686 | .1684 


o Or Or O Ol 
EI 
e 
m 
а 
я 


.00| .1667 | .1664 | .1661 | .1658 | .1656 
.10] .1639 | .1637 | .1634 | .1631 | .1629 
+1610 | .1608 | .1605 | .1003 
.30] .1587 | .1585 | .1582 | .1580 | .1577 
+40) .1562 | .1500 | .1558 | .1555 | .1553 


papoa 
i 
S 
ГА 
© 
© 


-50| .1538 | .1536 | .1534 | .1531 | .1529 
-60| .1515 | .1513 | .1511 | .1508 | .1506 
-1490 | .1488 | .1486 | .1484 
-80| .1471 | .1468 | .1466 | „1464 | .1462 
-90| .1449 | .1447 | .1445 | .1443 | .1441 


ooooo 
з 
© 
7 
F3 
© 
© 


-00| .1429 | .1427 | „1424 | .1422 .1420 
+10} .1408 | .1406 | .1404 | .1403 .1401 
+1887 | .1385 | .1383 | .1381 
+80} .1370 | .1368 | .1366 .1364 | .1362 
+40) .1351 | .1350 | .1348 +1846 | .1344 


NANA 
S 
n 
8 
$ 


-1333 | .1332 | .1330 | .1398 .1326 
«1316 | .1314 | .1312 | .1211 +1309 
+1297 | .1295 | .1294 | .1292 
-1282 | .1280 | .1279 | .1277 .1276 
+1266 | .1264 | .1263 | .1261 .1259 


Sus 
НЕЕ: 
3 
Ы 


8.00) +1248 | .1247 | .1245 | „1244 
8.10) -1233 | .1232 | .1230 | .1228 
8.20) +1218 | .1217 | .1215 | .1214 
8.30 -1203 | .1202 | .1200 | .1199 
8.40) -1189 | .1188 | .1186 +1185 
8.50] .1176 | .1175 | .1174 1172 | .1171 
8.60) .1163 | .1161 | .1160 .1159 | .1157 
8.70] .1149 | .1148 | .1147 -1145 | .1144 
8.80| .1136 | .1135 | .1134 .1132 | .1131 
8.90] .1124 | .1122 | ‚1121 «1120 | .1119 
9.00| .1111 | .1110 | .1109 «1107 | .1106 
9.10] .1099 | .1098 | .1096 +1095 | .1094 
9.20| .1087 | .1086 | .1085 -1083 | .1082 
9.30] .1075 | .1074 | .1073 -1072 | .1071 
9.40] .1064 | .1063 | .1062 |. 1060 | .1059 


-1053 | .1052 | .1050 | .1049 +1048 
+1042 | .1041 | .1040 | .1038 .1037 
.1030 | .1029 | ‚1028 .1027 
-1020 | .1019 | .1018 | .1017 -1016 
-1010 | .1009 | .1008 | .1007 .1006 


esse 
88388 
3 
= 
8 


TABLE VI.—Anzas, ORDINATES, AND DERIVATIVES OF THE NORMAL Curve! 

The following table gives proportions of the area under the normal curve from 2/6 = 0 
to values of 2/6 given in column (1). Values of the ordinates and of the third and fourth 
derivatives are also given. 


эп Tj ] 8) CD G | 


H 
TOOR 


.50 


(3) (4) (5) 
Ordi- 
nate 


$ 


3 Area 


-00 "| 0000 
-01 | 10040 
02 | 10080 
-03 | 10120 
04 | .0160 


.0199 
.02 


ocoooo оороо ооооо оооро cco-- 


-50 | 1915 | .3521 | .4841| 0.5501 | 1.00 | .3413 | .2420 | .4839 | —.4839 


„Reprod mm Mathematical Tables from Handbook of Chemistry and 
Physics compiled by Charles D. Hodgman, 7th ed., 1941, pp. 200-204. 


* If the ordinate shown in column (3) is designated as «o (=). then ез (=) is defined’ 
е (3) multiplied by G — 5) аа (=) is defined as qo (=) multiplied by 
(8-9 +9) 
BY successive differentiation of фо (5) it сап be shown that es (=) is also the third deriva- 


tive of go E) and ei (2) is the fourth derivative of e (5) (eee Chap. УП, pp. 142-145). 
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TABLE VI.—AREAS, OnpiNATES, AND DERIVATIVES OF THE NORMAL Curve! 


(Continued) 
a [£7] [E7] Чу | 3 a) © | G | (с) (5) _ 
= Ordi- Ordi- SS 
z HOJ je (ОЕ me | 


-0009 


PRPO 


ою 005‏ دن دن دن دعن 
ЛЕ e: e‏ 
S‏ 
دن دن دن دن دن 


сосософосо 
a 


wes‏ دن دع دن دون دع ډو دع 
to‏ 
& 


oe:‏ دوهن يندع دن 
EJ‏ 
e‏ 

00000950 www 030305905 соросооосо бороророоо coco Lea دن دع دع‎ es دن دن‎ 


9252020202 wawes 
ra = 
а 


(003030009 © 


by permission fro 
d by Charles D. Hodgman: ЧО Tables from Handbook of Chemistry and 


1941, рр. 200-20. 

* If th 

© ordinate shown in column (3) is designated as po G J HBR qa G ) is defined 
8 


as yo G ) multiplied Ъ Є = = а 
6. d and gy (3) is defined as go G ) multiplied by 
Ör? 4 
۾‎ Sete 
( at a): 
OF 
g) !t ean be shown that gs ($ 3) is also the third deriva" 


s z 
tive of фо (5) and ç4 ( ) is the fourth derivative of gy G 
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By successive differentiation of gy 


5) (see Chap. VII, рр. 142-145). 


, AND DERIVATIVES OF THE NORMAL CURVE? 
(Concluded) 


" = 
TABLE VI.—Ankas, OnpiNA! 


а) P G) | (D @ | @ |_& | o G) 
Ordi- z | Ordi- IN 
nate = | Area | nate a (=) 

$ 0001 3.50 | 00007 СА 

; . б 70047 

р -0001 4 -0000 10045 

t 10001 4 10000 10043 

on 0001 4 10000 10042 

с .0001 4 .0000 .0041 

4 

. .0001 4 .0000 .0039 

v 0001 4 -0000 -0038 

4 0001 4 10000 10037 

s 0001 4 10000 10035 

. 0001 4 10000 10033 

a .0001 4.60 .0000 .0033 

т -0001 4:61 10000 : 

a :0001 4:62 10000 

4. 10001 4.63 10000 

* -0001 4.64 10000 

rs .0001 4, 0000 | —. 

T 10001 4.6 10000 | —. 

va -0001 4. 0000 | —. 

t 0001 4:6 10000 | —: 

с 10001 4:6 0000 | —.0006 

4. .0001 4 0000 | —.0006 

4 0001 4 0000 | —. 0006) 

4 .0001 4 0000 | —. 0005| 

Al 10001 4 0000 | —:0005 

i 0001 4: 0000 | —:0005 

Fu .0001 4 0000 | —.0005 

a -0001 4 10000 | —: 0005 

Fu 0000. К 4 10000 | —.0004| 

4 0000 10096 | 4. 0000 | —.0004 

i 0000 10093 | 4. 10000 | —.0004 

я .0000 .0090 | 4. .0000 | —.0004 

4 10000 10087 | 4. 0000 | —: 0904 

4: 10000 10085 | 4. 0000 | —.0004 

4: 10000 10082 | 4. — 

i 10000 10079 | 4. — 

rà .0000 | —.0022| .0077 | 4. 

1. 10000 | —:0021| :0074 | 4. 

A: 10000 | —:0020] .0072 | 4. 

4l 10000 | —:0019] .0070 | 4. 

10000 | —:0019| .0007 | 4. 

4.4 

ra .0000 | —.0018 4. 

д: 10000 | —:0017 4.4 

4l 10000 | —:0017 4. 

4; 0000 | —.0016 4. 

10000 | —.0016 4. 

a 

2 .0000 | —.0015 5| 4.9 

P 10000 | —.0014 4. 

4.4 .0000 | —.0014 | 4.9 

2 0000 | —.0013 | 4.9 
0000 | —.0013 | 4. 

1.20 .5000 | .0000 | —.0012] .0047 | 

PryteProduced by permission from Mathematical Tables from Handbook of Chemistry and 


dict compiled by Charis D. Hodgman, 7th ed, 1941, bb 200-204. 
f the ordinate shown in column (3) is designated as po (=). then gs (=) is defined 


às us (2 7 
S Фо Ө) multiplied by E: Е 5) and gs (=) is defined as yo (5) multiplied by 
Gr? , xf 
(s = $2) 


в | 

У successive differentiation of go (2) it сап be shown that es (3) 

tiye 2 

ive of go (=) and ee ( ) is the fourth derivative of ео (5) (see Chap. VII, pp. 142-143). 
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is also the third deriva- 


474 


The column headings in this table are probabilities. 


SAMPLING STATISTICS AND APPLICATIONS 


TABLE VII.—TABLE or {* 


The figures in the 


4 5 "he stub of the table shows n, the degrees 
y of the table are values of t. Thes ihi 
н Тһе probabilities refer to the sum of the two tails of the t 


distribution, i.e., for both + and — values of t. 


For example, for n = 10, 


Pey 5 Tum =2. 'hich means P(t > 2.228) 
probability .05 shows in the table t A E. MU MuR DAR. di 
equals .05; P(t = +2.228) equals .025; and P( € —2. quais . 
last row of the ¢ table, for n = o, shows ¢ values corresponding to the 3 
qx 7 
values obtained from the area table of the normal curve (Table VI). 
n |Р =9 8 Toj 5 4 3 2 | 1 05 | 02 |0! 
[ | | 
| 
| | 
| | | | ۴ 53.65 
1 |158 |.325 |450 |127 |1.000 1.376 1.903 [3.078 6.314 12. 821 03 
2 142 |.289 |445 G17 | .816 {1.061 13% |1 886 2 965 Hon 
з |.137 |.277 |.424 |.584 | 765 -978 [1.250 [1.638 |2 St FU 
4 |14 (271 |44 [509 | лат | оп [1190 [1533 |» TAT dosi 
5 |132 1.267 1408 |.559 | .727 | 9% [1.156 [1476 |2 365 |4. 
б (31 1.265 |.404 [553 | .718 | оов [11м 1440 |i M3 x 
7 130 1.263 |.402 |.549 | 711 | 8% 1119 1415 |! 998 ete 
8 |.130 |.262 |.399 |.546 | 706 -889 [1.108 |1.397 |1 .896 RES 
9 |.129 |.261 |.308 |.548 |.z03 |.ssa 1.100 [1.383 |1 821 a 
10 |.120 |.260 |.397 |.542 | „тоо | вто 1.03 1.372 |1 74 |3. 
11 1.129 1.300 |.306 |.540 | 607 | 8% 1.088 |L.363 |1.796 | 2.201 se 
12 1128 |.259 |.305 |.539 | 6095 | ‘973 1.083 1.356 [1.782 | 2.179 3. T 
13 1198 |.259 5.394 |.538 | бм | вто 1.079 1.350 1.771 | 2.160 ы 
М |18 |.258 1393 |37 | 692 | ‘зов 1.076 1.345 |1.761 | 2.145 3 ae 
15 |.128 |.258 |.393 |5% | бю | s66 1.074 1.341 |1.753 | 2.131 29 
10 128 |.258 |.392 |535 | буу | s65 1.071 1.337 fi 2.120 
17 |18 1957 |.392 |534 | oso | эз 1.069 1333 |1 2.110 
18 |.127 |.257 |.302 |.534 |688 -862 1.067 [1.330 |1 2.101 
19 127 |.257 391 |533 | oss | и 1.000 [1.328 |i 2.093 
20 1.127 i257 зор |538 | os -860 1.064 [1325 |1 2.086 
21 127 |.257 |.301 |.532 | 6% 850 1.003 [1.323 [1.721 | 2.080 | 2.518 | 2.831 
22 |.127 |.256 390 |'532 | б% -858 1.061 1.321 [1.717 |2074 |2 508 үр 
23 |127 1256 | зоо |53 | 5685 858 10% 1.319 |1.714 | 2-069 |2500 pus 
24 1.127 |.256 |.390 |531 -685 | .857 1.059 1.318 1.71 | 2.064 | 2.492 aa 
25 (127 1.256 зоо [so | бм | в» 1.058 1.316 1.708 | 2.060 | 2.485 | 2.7 
26 |.127 |.256 390 |531 | бм :856 1.058 1.315 1.706 | 2.056 | 2.479 
27 1.127 |.256 |.389 |5 | бз 855 1.057 1.314 [11703 |2052 |2413 
28 |.127 |.256 |.389 |.530 | вз :855 1.056 1.813 |701 |2.048 |2 467 
29 |.127 |.256 |.389 |.530 | 683 -854 1.055 Пзп 160 |2.045 | 2 462 
30 127 |.250 |.389 |530 | 683 | вы 1.055 1.310 [1.697 | 2.042 |2.457 
^ 1.12566 .25335 38532] 52440] 67449 -84162 1.03643 1.28155 1.64495 1.95996) 2.32034 


* Reprinted from Table IV of В. А. Fisher, Stati 


ica! Methods 


Edinburgh), by kind permission of the author and publishers, 
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The column headings in this table are probabilities. The figures in the 
body of the table are values of xè. The stub of the table shows n, the 
degrees of freedom. The probability of a value of x? equal to or greater 
than the value specified is given. Thus for n = 10, P(x? > 3.940) is .95; 


and accordingly by subtraetion P(x? € 3.940) is 1.00 — .95 = .05. 


n |P=.99] ..98 95 oo | 0 | .70 | во | зо | 20 | 440 | -05 | .02 | -01 


3.841| 5.412| 0.635 
5.991] 7.824| 9.210 
7.815| 9.837/11.341 
9.488}11.668 13.277 
6 11.070 13.38$ 15.086 


Г .000157| .000628| .00393| .0155 .0642| .148| .455| 1.074 
ES 0201 0404 .103 211 446 .713| 1.386] 2.408 
3 | 15 | 155 | 352 | .584 | 1.005 | 1.424) 2.366) 3.005 
4 | 297 | 40 | ти | 1.064 | 1.049 | 2.195] 3:357] 4.878 
5 | 534 | 12 |1145 | 1.610 | 2:343 | 3.000] 4.351) 6064 


2.204 | 3.070 | 3.828] 5.348) 7.231 5.558 10.645 12.592 15.033 16.812. 
2.833 | 3.822 | 4.671| 6.346] 8.383) 9.503 12.017 14.067 [16.622 18.475 
3.490 | 4.504 | 5.527| 7.344) 9.524 15.507 |18.10820.090 
4163 | 5.380 | 6.393] 8.343/10.656 12. .684 16.919 |19.679 21.606 
4.805 | 6.179 | 7.267| 9.342 11.781 [13.442 15.987 18.307 [21.161 [23.209 


6 | 872 | 1.134 
7 | 1.239 1.504 
8 | 1.646 | 2.032 
9 
0 


2.088 2.532 
2.558 3.059 


5.578 | 6.989 | 8.148] 10.341 12.599 14.631 17.275 19.675 22.6182: 
6.304 | 7.807 | 9.034|11.340 14.011 15.812 
7.042 | 8.034 | 9.920/12.340 15.11916.985 
7.790 | 9.467 |10.821/13.339 18.151 0214 
8.547 [10.307 |11.72114.339 19.311 [22.307 24. 


11 | 3.053 | 3.609 
12 | 3.571 | 4.178 
13 |4.107 | 4.765 
M | 4.660 | 5.368 
15 | 5.229 | 5.085 


29.141 
30.578 


20.465 23.542 26.296 29.633 32.000 
.615 24.769 27.587 |30.995 33.409 
22.760 25.989 28.869 32.340 34.805 
23.900 27.204 30.144 33.087 30.191 
55.033 |28.412/31.41035.020 87.506 


10 |592 | оош | 7.962 |9312 [11.152 12062415338 
17 | 6408 | 7.255 | S672 |10.085 [12.002 13.531/10.838 
ы 7.015 | 7.906 |9300 10.805 [12-857 [14444017338 
2 7.033 |8567 10.117 [11.651 [13.716 1535218338 
T 8.260 9237 [10.551 [12.443 [14.578 [16.266 19.337 


126.171 29.615 32.671|36.343|38.932 
27.301 30.313 33.924 37.059 40.289 
28.429 32.007 |35.17238.968 41.638 
29.553,33.196/30.415 40.270 42.980 
30.675 34.382 37.652 41.500 44.314 


ph 8.897 | 0.915 [11501 [13240 [15.445 [17.182 
2 | 9.542 10.000 12338 [14.041 [16.314 18101 
23 полов [11203 13.001 [14.548 17157 194071 
24 10.856 looo [13.848 15.059 18.062 |19.043 
2 [11.524 [12607 14.011 [16473 |18.040 20.807 


59.246 31.795|35.56338.885 42.856 45.642 
530.319 32.912 36.741 40.113 44.140 46.963 
31.391 34.027 37.916 41.337 45.419 48.278 
132.461 35.139 30.087 42.557 46.693 40.588 
33.530 36.250 40.250 43.773 47.902 50.892 


26 12.198 [13.409 
27 12.879 14.15 
28 13565 — [14.847 
29 14.256 [15574 
For larger values of n, the expression 4/' 3n — 1 may be used as a normal deviate with unit stan- 
dard deviation, 
ti Reprinted from Table III of R. A. Fisher, Statistical 
inburgh), by kind permission of the author and publishers. 
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TABLE X.—5 PER CENT AND 10 PER CENT POINTS or THE SAMPLING Dis- 
TRIBUTION OF 4/8, FOR SAMPLES OF VARIOUS Sizes! 
(Approximate values) 


Values of VBı for P(/Bi >) 
Size of | 
sample | x X5 
25 “11 1.061 
30 .661 .982 
35 .621 .921 
40 587 .869 
45 .558 .825 
50 .933 -787 
60 .492 .728 
70 .459 . 673 
80 .432 . 631 
90 .409 .596 
100 | .389 .567 


1 Reproduced from Р. Williams, 


" Note on the Sampling Distribution of уі, Where the 
Population Is Normal,’ 


' Biometrika, Vol. 27 (1935), pp. 269-271. 


TABLE XL—LowEn AND UPPER 1 PER CENT AND 


5 PER CENT PoINTS OF 
THE SAMPLING DISTRIBUTION OF B. 


P 1 
FOR SAMPLES OF VARIOUS SIZES 


(Approximate values) 
MEE a e з. _ ا‎ 


Values of 8s 
Size of sample For P(g: € ) For Р(В: = ) 
.01 Da .05 Г .05 ; .01 2 
100 2.18 2.35 8.77 4.39 
125 2.24 2.40 3.70 4.24 
150 2.29 2.45 3.05 EC 
175 2.33 2.48 3.61 4.05 
200 2.37 2.51 3.57 3.98 
250 2.42 2.55 3.52 3.87 
300 2.46 2.59 3.47 3.79 
400 2.52 2.64 3.41 3.67 
500 2.57 2.67 3.37 3.60 
800 2:65, 2.74 3.29 3.46 
1,000 2.68 2.76 3.26 3.41 
2,000 2.77 2.83 3.18 3.28 
5,000 2.85 2.89 3.12 3.17 


“А Further 


Development of Tests for Normality,’ 
931), pp. 239-249, 


Biometrika, Vol. 22 (1930-1 Table assumes population to be normal. 
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Tarte XII.—Lowren AND UPPER 1 Per Cent, 5 Per Cent, AND 10 PER 
Centr Points or SAMPLING DISTRIBUTION OF a = A.D./o FOR 
SAMPLES or Various Sizes! 

(Approximate values) 


Values of a 
n* For P(a € ) For P(a =) 
.01 .05 .10 .10 .05 .01 
10 .6675 .7153 .7409 .8899 .9073 .9359 
15 .6829 ‚7236 ‚7452 ‚8788 ‚8884 ‚9187 
20 ‚6950 ‚7304 ‚7495 ‚8681 ‚8768 .9001 
25 . 7040 ‚7360 - 7530 ‚8570 ‚8686 .8901 
30 .7110 ‚7404 -7559 .8511 .8625 .8827 
35 ‚7167 ‚7440 -7583 ‚8468 .8578 .8769 
40 .7216 .7470 .7604 .8436 .8540 .8722 
45 ‚7256 ‚7496 .7621 .8409 .8508 .8682 
50 .7291 .7518 .7636 .8385 .8481 .8648 
60 .7347 .7554 .7662 .8349 .8434 .8592 
70 . 7393 .7583 ‚7683 .8321 . 8403 .8549 
80 . 7430 .7607 .7700 .8298 .8376 .8515 
90 .7460 .7626 7714 .8279 .8353 ‚8484 
100 ‚7487 ‚7644 .7726 .8264 ‚8344 ‚8460 
200 .7629 .7738 .7796 .8178 .8229 .8322 
300 .7693 . 7781 .7828 .8140 .8183 .8260 


1 Abridged from В. С. Geary, " Moments of the Ratio of the Mean Deviation to the 
Standard Deviation for Normal Samples," Biometrika. Vol. 28 (1936), pp. 295-307. 
1 


-- M zs 


* The size of sample is №, and n = 
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Xn — Xi* 
TABLE XIIL—SawPLING DISTRIBUTION OF THE RANGE w = m 
| | MN ue н кырр ок 
ver tai i: | Upper tail of distributi 
| Lower tail a distribution per tail of di 
Mean z - DC 
N ыз aste | - - | - i (gm 
+001) .005 .010 .023/.050/.100 . 100, .050 .025!.010). 005). 
| | 
2 | 1.128 | .886 |.853| .00| .01| .02| .04| .o0| 2.7/8 
3 | 1.693 | .591 |.888| .06) .13| .19| .30| .43 3.313 1 
4 | 2.059 | .486 -880) -20| .34| .43| .59 43.633 .6 
5 | 2.326 | .430 |.864] .37 -66) .85 3.804 M at 
6 | 2.534 | .395 |.848) .54| .75| „871.06 4.034 VUES 
7 | 2.704 | .370 |.833) .69| .92/1.051.25 4.17]4 pis 
8 | 2.847 | .351 |.820 .831.081.201.41 4.294. 26 Eai 
9 | 2.970 | .337 |.808| .9611.21/1.3411.55 4.394 -345.90 
10 | 3.078 | .325 |.797/1.08]1.33/1.47/1. 4.474 42 RES 
11 | 3.173 | .315 787 1.20 1.451.581. 4.554 49 ES 
12 | 3.258 | .307 |.7781.301.55[1.681. 4.624 Сое 
13 | 3.336 | .300 |.7701.38/1.6411 77 i9 54.694 60 des 
14 | 3.407 | .294 70211.47/1.721.86/2. 4.74|5 ace 
15 | 3.472 | .288 |.755/1.55/1.80]1.932. 4.805 58 
10 | 3.532 | .283 |.749/1.63|1.88/2 012 2. 4.855 46. д 
17 | 3.588 | .279 |.743/1.69/1.94/2.07]o. 2. 4. 5.79 Hen 
18 | 3.040 | .275 7381.752.0112.14|2. 2. 4.9 5.82 кү 
19 | 3.689 | .271 |.733/1.82/.07/2 20 2. 2; 54. 5.866. 
20 |3.735 | .268 780]! 882.132.252 2. 5. 


* Reproduced by рег 8 
in Samples of № Observations from a Normal Р 


pp. 301-308; and “The Percentage Limits for 
Normal Population," 


(1941-1942), p. 308, h 


n from E. $. Pearson The Probability Integral of the Range 


в 

opulation," Biometrika, Vol. 32 (1941-1942), 
the Distribution of Range in Samples from а 
ibid, Vol. 24 (1932), pp. 404—417, Table 2 in Biometrika, Vol. 32 
has been extended from N = 12 to N = 20 by interpolating for the 
able 1, pp. 302-307, of the same article. The values for the mean w 
‘ation of w were obtained from a table in Biometrika, Vol. 24 (1932), 
Р. 416. It is to be noted that Xn refers to the largest and Xi to the smallest X in the 
sample, 


In 1925, tables giving the expected or mean value and the standard deviation of range 
in random samples from а normal population were calculated by L. Н. С. Tippett, Depart- 
5, University College, London, Since the probability distribution 
mal in form, it was evident that its mean and standard deviation 
ll the information generally needed in practice. Tippett included 


The actual method of computa- 
duse у, and qe ealeulations were carried out under his а 
puting Service, Ltd. The i like thai 
Table XIII, was limited to N = 20. As N increases ear eee pes is aie increasing 
risk that the table may be misleading in pastos since fa 9 
relatively slight departures from normality in the tails of the population distribution. OF 
E. S. Pearson, “Тһе Probably Integral of the Range in Samples of n Observations from ® 
Normal Population,” Biometrika, 


Vol. 32 (1941-1949), р, 308. 
Та» is the reciprocal of the mean w. 


нЕ 
(ш) becomes very sensitive 
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TABLE XIV.—HYPERBOLIC TANGENTS! 


| | 

| 
2 г = tanh: | z r = tanh z z r = tanh 2 

| 

| 
0.00 0.00000 0.55 1.10 0.80050 
0:10 01000 | 0:50 11 180406 
0.02 | 0.57 1.12 .80757 
0.03 0:58 1:13 181102 
0.04 0:59 1.14 81441 
0.05 0. 1.18 0.81775 
0.06 0. 1.16 182104 
0.07 0.6 1.17 282427 
0.08 0.6: 1.18 „82745 
0.09 0.6 1.19 -83058 
0.10 0.65 1: 0.83365 
0.11 0.66 1.2 183668 
0:12 0:67 1. 83965 
0.13 0.68 1. 84258 
0.14 0:69 1 84546 
0.15 0.70 0.60437 m 0.84828 
0.16 1 161068 1. 185106 
0:17 2 i 85380 
0.18 3 І: 85048 
0:19 Hn 12 85913 
0.20 75 1.30 0.86172 
9: б 1.31 186428 
у 0.77 1.32 186678 
0.23 0.78 1.33 186925 
0:54 0.79 1.34 187167 
0.25 0 0.66404 1.35 0.87405 
0.26 Eon 166959 1.36 .87639 
0.27 120302 0.82 167507 1.37 187869 
0.28 .27291 0.83 168048 1.38 .88095 
0.29 198213 0.84 ‚68581 1.39 88317 
0.30 5 0.69107 1.40 0.88535 
0.31 0:86 169626 1:41 188749 
0:32 0:87 2 88900 
0.33 0.88 3 189167 
0.34 0:89 1.44 189370 
0.35 1.45 0.89569 
0.36 0:96 1.46 .89765 
0:37 0:92 1.47 189958 
0.38 0.93 1.48 190147 
0.39 0:91 1:49 190332 
0.40 5 1.50 0.90515 
0:41 0:96 | rà 190694 
0.42 0:97 1:52 -90870 
0.43 0:98 1.53 191042 
0.44 0:99 1.54 191212 
0.4 1.00 0.76159 1.55 0.91379 
0.4 1.01 + 76576 1.56 91542 
0 1:02 5 1:57 91703 
0 1:03 1.58 191860 
0. 1:01 | 1.59 192015 
0.50 5 0.78181 1.60 0.92107 
0.51 und 178566 1:61 192316 
0.52 1:07 178946 1.62 .92462 
0.53 1.08 79320 1.63 - 92606 
0.54 1.09 79688 1.64 .92747 


LM Hopeman, CHARLES C. Mathematical Tables from Handbook of Chemistry and 


ics (1941), 
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TABLE ХТУ.— НурЕеввомс TaNcENTS.— (Contin ued) 


I 
& 


ә er нынын 
mee eem Hm 
беен кезген неее piii ii ne 
boron Reh сыс э лии єч Pe 


DNNN ююююю юююю 


.00 


r = tanh z z r = tanh 2 
0.92886 | 2.75 
-93022 | 2:76 
193155 | 2.77 
-93286 2:78 
-93415 2.79 
0.93541 0.97803 | ә 
.93665 | :97846 
:93786 | 197888 :99292 
-93906 | .97929 . 99306 
-94023 | -97970 | . 99320 
| 
0.94138 2.30 0.98010 0.99333 
2.31 -98049 799346 
2:32 -98087 -99359 
2.33 -98124 2 199372 
2.34 -98161 2 .99384 
2.35 0.98197 2.90 0.99396 
2:36 198233 2:01 :99408 
2:37 -98267 2.92 190420 
| 2.38 -98301 2:93 199431 
| 2.39 -98335 2:94 99443 
0. 2.40 0.98367 2.95 0.99454 
. 2.41 -98400 2:96 -09464 
2.42 99431 2.97 99475 
2.43 - 98462 2.98 . 99485 
| 2.44 -98492 2.99 . 99496 
2.45 0.98522 3.0 0.99505 
2.46 108551 31 -99595 
| 2.47 -98579 3.2 
| 2.48 -98607 3:3 
2.49 . 98635 3.4 
9. 2.50 0.98661 3.5 
s | 2.51 98688 3:6 
| 2.52 98714 3.7 
2.53 -98739 3.8 
2.54 -98764 3.9 
р: 2.55 0.98788 4.0 
Н 2.56 -98812 4.1 
2.57 -08835 4.2 
| 12.58 -98858 4.3 
2:59 -98881 4:4 
9. | 2.00 0.98903 4.5 
К 2.61 -08924 4.6 
2.02 -98946 4.7 
| 2.63 -98966 4.8 
2.64 -98987 4.9 
9. 2.65 0 
-99007 у 
$ 2:00 -99026 ED 
og id -99045 
2; 9 
0. 2. 
: 2: 
2: 
«97526 2 


= 
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SUBJECT INDEX 


A 


Analysis of variance, chance varia- 
tion, measure wf, more 
than one basis of classifica- 
tion, more than one case 
in each class, 435-436 

single case in each class, 434 
one basis of classification, 434 
F distribution, use of, more than 
one basis of classification, 
more than one case in each 
class, 437-438, 441-442 
one case in each class, 431, 
433 
selection of region of rejection, 
428 
single basis of classification, 425 
testing correlation coefficients, 
ete., 443 
testing linearity, 444 
"interaction," 435-436 
more than one basis of classifica- 
tion, more than one case 
in each class, analysis of 
variance table, 441 
numerical analysis, 438-442 
study of results, 441—442 
theoretical basis, 435-438 
single ease in each class, analy- 
sis of variance table, 433 
comparison of results with 
those of single basis, 434 
numerical analysis, 431—433 
theoretical basis, 429—431 
nonnormal propulation, 449—450 
remainder variance, more than 
one ease in each class, 435- 
436 
one ease in each class, 429—433 


Analysis of variance, single basis 
of classification, numerical 
analysis, 425-428 

theoretical basis, 423-425 
worksheet for calculating sums 
of squares, 427 
tests of correlation coefficients 
as, 442-444 
tests of linearity, 444 
Asymmetrical binomial distribution, 
betas, formulas for, 45 
derivation, 67 
characteristics, 44—46 
derivation, 40-44 
as distribution of sample per- 
centages, 190 
effect of changing №, 49 
formula, 43 
graphs, 44, 45 
mean, formula for, 45 
derivation, 65-66 
mode, formula for, 45 
derivation, 68 
and the normal curve, 
68-74 
numerical examples, 42, 43 
and Pearson's type ПТ curve, 
47-50 
relative slope, 76-77 
second approximation, 72-73 
standard deviation, formula for, 
45 
derivation, 66-67 

Average deviation, 12-13 

Averages (see individual titles such 

as: Mean, Median, and Mode) 


B 


Beta coefficients, 81 and 8», 10-11 
sampling distribution of VBı, 
242-244 


46-47, 


487 


488 


ficients, sampling distri- 
вые = of 4/8;, table of .05 and 
.10 points, 480 
sampling distribution of 8., 242— 
245 
table of .01 and .05 points, 480 
standard error of 4/7, 242, 244 
standard error of 8», 242, 244 
Binomial distribution (see Sym- 
metrical binomial distribution; 
Asymmetrical binomial dis- 
tribution) 
Binomial expansion, 25-26 
Bivariate frequency distribution, 
13-22 
Boldfaced type, 10 
Breve, 10 


с 

Calculus of Observations, The, 93 

Camp, B. H., 454 

Chi square (x?) distribution (see 

Frequency curve; 

(x?) distribution) 

Square test (see Frequency 

curves, testing goodness of fit, 

by x? test) 

oefficient, of correlation (see Cor- 

relation, coefficient of) 

of multiple correlation (see Mul- 
tiple correlation, coefficient 
of) 

of risk (see Statistical inference, 
testing hypotheses, coefficient 
of risk) 

Combinations, 24-25 

Confidence coefficient (see Statis- 
tical inference, confidence in- 
tervals, confidence coefficient) 

Confidence intervals (see Statistical 
inference, confidence intervals) 

Contingency tables, 337, 422n. 

(See also Independence, test of) 
two-fold classification, 395-397 
Correlation, coefficient of, 

sonian, 19 
and the breakup of variance, 20-21 
confidence limits for, 301-302 


S, chi square 


Chi 


С 


Pear- 
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Correlation, coefficient of, Pear- 
sonian, maximum likelihood 
estimate of, 302-303 

as measure of goodness of fit, 20 
às measure of proportion of total 
variance, 21 | 
relationship to first-order variance, 

19-20 
sampling distribution, 298-300 
significance, 21 
dn hypotheses about, 300-301 

Correlation. index, testing signifi- 
сапее of, 306 

Correlation ratio, 21-22 

testing significance of, 306 

Cumulants, of contributory causes, 
relationship to cumulants of 
variable, 83, 94-96 

definition of, 83 
as semi-invariants, 83 


D 


Danish census (1923), 184 
Deciles, 8 А 
Degrees of freedom, in analysis of 
variance, 424, 434, 441 
in a contingeney table, 337 
explained, 314, 318-319 
fitting a frequency curve, 332-333 
identified with n, 327-328 А 
Density of samples, in distribution 
of all possible samples, 256-258 
Dependent variable, estimation of, 
from sample line or plane 
of regression, 387-388 А 
with allowanee for sampling 
errors, 388-389 
Design of experiments, 162 
Dispersion, subnormal and super- 
normal, 422%, 


E 
Elderton, W. P., 128, 134, 148 
Ezekiel, M., 305 

F 


F distribution (see Frequency curves; 
F distribution) 
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First-order standard deviation, 18— 
19 (See also Higher-order vari- 
ances) 

definition, 18 
relationship to r, 19-20 

Fisher, Arne, 92 

Fisher, В. A., 11, 114п., 242, 299 

Flat space, 253 

Frequeney eurves, chi square (x?) 

distribution, description, 111— 
112 | 
formula, 111 
graphs, 112 
mean, 112 
mode, 112 
probabilities, table of, 475 
standard deviation, 112 
explanation of, 3-5 
Г distribution, description, 112— 
114 
formula, 113 
graph, 113 
mean, 113 
mode, 113 
probabilities, table of, 476-479 
standard deviation, 113 
fitting of, Gram-Charlier curves, 
133-134 
nonnormal curves, 132-137 
normal eurve, 131-132 
Pearsonian curves, 134—137 
sampling curves, 137 
Sheppard’s corrections, 10, 12, 
131-132, 134 
Gram-Charlier (sec Gram-Charlier 
frequeney curves) 
graph, 4 
graphing of, labeling о 
scale, 110n. 
normal curve, 115-116 
other eurves, 118-119 
1, x? and F curves, 116-117 
nonnormal, examples from every- 
day life, 105-106 
examples from sampling analy- 
sis, 107-114 
normal (see Normal frequency 
curves) 


f vertical 


Frequency curves, Pearsonian (see 


Pearsonian frequency curves) 
probabilities, computation of, 119— 
120 
for x? curve, 125-126 
for F curve, 127 
for normal curve, 120-123 
for other curves, 127-131 
for t curve, 123-124 
by quadrature formulas, 128 
as probability distributions, 28 
sampling distributions (see Sam- 
pling distribution) 
t distribution, approximation by 
normal curve for ® between 
30 and 100, 401n. 
description, 109-110 
formula, 111 
graph, 110 
kurtosis, 111 
mean, 111 
probabilities, table of, 474 
variance, 111 
testing goodness of fit, 331-333 
by x* test, Gram-Charlier 
curves, 144-145 
normal eurve, 139-142 
Pearsonian curves, 151, 152 
by graphic comparison, Gram- 
Charlier curves, 142-144 
normal eurve, 137-139 
Pearsonian curves, 146-151 
theory of, conditions leading to 
nonnormality, 100-105 
conditions leading to normality, 
37-39, 100 
contributory causes, Gram- 
Charlier curves, 83-84, 90- 
92, 93-98 
nonnormal curves in general, 
100-103 
normal curve, 35-39, 100 
Pearsonian curves, 60-65 
sampling distribution of the 
mean, 107-108 
summary, 100-105 
z distribution, 114%. 
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Frequency Curves and Correlation, 


134, 148 
Frequency distributions, 1-5 
bivariate, 13-14 
illustration of, 13 
Frequency series, continuous, 1-2 
discrete, 1 
Fourier integral theorem, 93, 97 


G 


Galton, Francis, 18 
Gamma function, 78-79n., 254-255 
Goodness of fit, of frequency curves 
(see Frequency curves, testing 
goodness of fit) 
Gram-Charlier frequency curves, 
assumptions, 82-83 
comparison with Pearsonian 
curves, 90-92 
comparison with Pearsonian type 
Ш curve, 82 
components of, 85-90 
combinations of, 86, 87, 88, 89 
graphs, 85, 87 
derivation, 82-85, 92-99 
formula for type A, 84 
Eeneral significance, 90-92 
Probabilities, computation of, 145. 
146 
relationship to asymmetriea] bj. 


nomial distribution, 91-92 
type B, 90 
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sampling distributions, 242-243 
standard errors, 243 

Guldberg, М. А., 454 


H 


Higher-order variances, confidence 
limits, 386-387 
definition, 18 
maximum-likelihood 
387 
sampling distribution, 385-386 
testing hypotheses about, 386 
use in estimating dependent vari- 
ables, 387-390 


estimates, 


Histogram, 2-3 
graph, 2 
of relative frequencies, graph, 4 
Homogeneity, meaning, 101-103 
test of, 338-339 
Hotelling, H., 452 
Hyperbolic tangents (table), 483- 
484 
Hypergeometrical distribution, char- 
acteristics, 55 
criterion for, 81 
derivation, 51-55 
as distribution of sample per- 
centages, 210-211 
formula, 54 
graph, 55 
mean, 55 
moments, 56 
numerical example, 53 
and the Pearsonian curves, 57-58 
relative slope, 79-81 
Hypergeometrical series, 54n. 
Hyperplane, 253 
Hypersphere, 254 » 
Hypotheses, testing of (see Statisti- 
cal inference, testing hypotheses) 


I 


Incomplete gamma function, 255 
Independence, meaning of, 334-335 
test of, distribution of 
Y, (№: — Np)? 
Npi 
use of, 337-338 
estimating population percent- 
ages, 335-337 
the null hypothesis, 334 
the problem, 333-334 


J 
Journal of the Royal Statistical 
Society, 159 
K 


Kendall, М. G., 159, 160 
k Statistics, 11-12, 242-243 
Kurtosis, 11 
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L 


Law, of large numbers, 27-28 
of small chances, 212 
of small numbers, 212 
Least squares, method of, 15, 374 
minimizing horizontal deviations, 
illustrated, 16 
minimizing vertical 
illustrated, 15 
Leptokurtie dist ributions, 11 
Lexis' analysis, 422n. 
Likelihood ratio, 345 
Linear function, sampling distri- 
bution, 108-109 
Lines of regression, 16-18 
Literary Digest poll, 157 
Logarithms, of factorials, 117 
ог numbers, four-place common 
(table), 457-460 
Tepresentation by an infinite series, 
70-71n. 


deviations, 


M 
Mathematical Statistics, 92 
Mathematical Theory of Probabilities, 
92 
Maximum-likelihood estimates (see 
Statistical inference, maximum 
likelihood estimates of popula- 
tion parameters) 
Mean, arithmetic, calculation, 6-7 
definition, 5-6 
Joint sampling distribution of 
mean and standard deviation, 
340-344 
sampling distribution, any pop- 
` ulation, 107-108 
normal population, 231, 262- 
263 
special nonnormal 
tions, 445-446 
standard error, 231, 268 
geometric, 9 
harmonic, 9 
Tean value, of a sum or difference, 
formula, 392 
proof of formula, 419-420 


popula- 
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Means, progressions of, 16 
charts, 14, 15 
Median, definition, 7 
sampling distribution, 241-242 
standard error, 242 
Meidel, M. B., 454 
Methods of Correlation Analysis, 305 
Mode, 8 
Moment coefficients, 9 
Moment generating function, 93-94 
Moments, 9 
Most powerful test, 1987. 
Multinomial distribution, 309-323 
(See also Random sampling, 
from a discrete manifold 
population, distribution of 
sample percentages, and test- 
ing hypotheses about p;'s) 
Multinomial expansion, 26 
Multiple correlation, coefficient of, 
confidence limits for, 305 
maximum-likelihood estimate, 
305-306 
testing significance of, 305 
Multivariate frequency distribution, 
13-14 


N 


Narumi, S., 454 
N-dimensional geometry, 253-254 
Neyman, J., 353, 362 
Nonnormality, problem of, 445-455 
indirect attacks on, 451—455 
qualitative or semiqualitative 
methods, 451-453 
Tchébychef’s and other in- 
equalities, 453—455 
transformation of the data, 451 
sampling distributions of various 
statistics for specific non- 
normal populations, 445—448 
of normal theory for non- 
normal populations, in case 
of analysis of variance, 449— 
450 
in case of correlation coeffi- 
cients, 451 


use 
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Nonnormality, use of normal theory 
for nonnormal populations, 
in case of means, 448-449 
in case of »/N(X-X) /z, 449 
in case of variances, 450 
Normal frequency curve, areas under 
(table), 469-473 
characteristics, 10, 12 
conditions leading to, 37-39 
derivation, 34-35 
derivatives of (table of з and 
өл), 469-473 
first approximation to binomial 
distribution, 73 
fitting of, 131-132 
formula, 35 
graph, 121 
graphing of, 115-116 
ordinates of (table), 469-473 
probabilities, computation of, 123- 
124 
significance, 35-37 
standard form, 137-139 
testing goodness of fit, by x? test, 
139-142 
by graphic comparison, 137-139 
truneated, 104—105 
Normality, determining departure 
from, by fitting a normal 
curve, 131-132, 137-142 
by special statistics, 242-245, 
296-297 
Null hypothesis, in analysis of 
variance, 423-424, 430-431, 436 
in comparing two samples, 391— 
392 
in testing difference between two 
sample means, 398 
percentages, 393 
variances, 407-408 
in testing independence, 334 


B 


Pabst, Margaret R., 452 

Parameters, 5, 10 

Partial correlation, coefficient of, 
inferences about, 303-304 


SAMPLING STATISTICS AND APPLICA TIONS 


Pearson, E. S., 353, 362 

Pearson, Karl, 57, 58, 64, 134, 148, 
158, 255 | 

Pearsonian coefficient of correlation 
(see Correlation, coefficient of, 
Pearsonian) 

Pearsonian frequency curves, equa- 
tions for, determination of, 
146-151 

as an explanation of nonnormal 
curves, 65 
general formula, 134 + 
and the hypergeometrical dis- 
tribution, 57-58 
computation of probabilities, 151, 
152 
relative slope, formula for, 57 
types, chart for distinguishing, 136 
criterion for distinguishing, 58, 
135 
main, 58-60 
table of, 135 
transitional, 60 
type I, gra. h, 59 
type IIT, derivation, 48-50, 
76-79 
formula, 49 
graph, 49 
type IV, graph, 59 
Percentage, sampling distribution, 
190-194, 210, 211-212 
Standard error, 190, 195, 210-211, 
215, 315, 319 
Statistical inferences about (see 
Random sampling, from & 
discrete twofold population; 
Random sampling, from & 


discrete manifold population) 
Percentiles, 8 


Permutations, 23 
Platykurtic distributions, 11 
Poisson distribution, betas, 215, 217 

derivation, 215-216 

formula, 212 

mean, 215 

derivation, 217 
moments, 217-218 
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Poisson distribution, and the normal 
distribution, 215, 217 
as sampling distribution of per- 
centages, 211-213 
testing hypotheses with, 213-215 
variance, 215 
derivation, 217-218 
Polls, public opinion (see Random 
sampling, of public opinion) 
Polygons of regression, 16 
Population, bivariate, samples from. 
298 : 
meaning of, 5 
types, 154-155 
Power of a test, 198n. 

Probability, definition of mathe- 
matical probability, 26-27 
dependent probabilities, 30-31 

empirical approximated, 27 
equations involving, 28-29 
independent probabilities, 30 
and law of large numbers, 27-28 
probability set, 26-27, 30 
Probability caleulus, addition the- 
orem, 29 
meaning of mutually exclusive in, 
29 
multiplication theorem in, 30 
dependent probabilities, 30-31 
independent probabilities, 30 
Probability eurves (see Frequency 
curves) 
Probability distributions, definition, 
28 
Probability set, 26, 30 
derived or second order, 30 
Progressions of the means (sce 
Means, progressions of) 
Public opinion polls (see Random 
sampling, of public opinion) 


Q 


Quadrature formulas, 128 

Quality control, use of range in, 
295-296 

Quartiles, 7-8 
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R 


Random sampling, from a discrete 
manifold population, assump- 
tions, 308-309 

confidence zone for p;'s, 328-329 
distribution of sample 
y GU — №, 
بے‎ Npi 
explanation, 323-326 
distribution of- sample per- 
centages, derivation, 309- 
312 
formula, 311 
illustration of a skewed dis- 
tribution, 317-319 
illustration of symmetrical 
distribution, 312-314 
mean of, 314-317 
standard deviation of, 314- 
317 
maximum-likelihood | estimates 
of p;'s, 329-330 
testing hypotheses about рг, 
using distribution of 
г. —Np;)2 
p» GUARD 326-328, 330- 
331 
using multinomial distribu- 
tion, 319-323 
from a discrete twofold popula- 
tion, assumptions, 186-187 
confidence coefficient and the 
confidence interval, 207-208 
distribution of difference be- 
tween two sample percent- 
ages, 393-394 
distribution of sample per- 
centages, derivation, 188- 
190 
formula, 190 
and the population percent- 
age, 191—194 
.and the size of the sample, 
190-191 
estimation of population per- 
centage, confidence inter- 
vals, 202-207 
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Random sampling, from a discrete 
twofold population, esti- 
mation of population per- 
centage, maximum likeli- 
hood estimate, 208-209 

size of sample and the con- 
fidence interval, 207 
small population percentage, 
211-215 
small populations, 209-211 
testing the difference between 
two sample percentages, 
392-395 
alternative method, 395-397 
testing hypotheses, coefficient 
of risk, 195 
regions of rejection, 195-201 
the test, 202 
from a normal population, con- 
fidence limits for 62, 287-290 
confidence limits for X, rela- 
tionship between interval, 
coefficient, and N, 272-273 
ê known, 269-272 
8 unknown, 280—283, 284 
distribution of all possible sam- 
ples, derivation, 255-256 
geometrical representation, 
256-258 
properties, 257-259 
in terms of their means and 
variances, 259-262 
distribution of samplesof N — 2, 
derivation, 221-224, 246- 
247 
geometrical measurement of 
VN (X — X)/s, 228-229 
geometrical measurement of 
в, 226-228 
geometrical measurement of 
a?, 226-228, 248 
geometrieal measurement of 
X, 225-226, 248 
numerical illustration, facing 
224 
properties, 224-229, 247-248 
in terms of their means and 
variances, 248-250 
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Random sampling, from a normal 
population, distribution of 
sample a's (=A.D./s), 
244-245 

use in testing departure from 
normality, 296-297 
distribution of sample means, 
derivation, 262-263 
formula, 231 
(N = 2), derivation, 229-231, 
250-251 
use in making inferences 
about population mean, 
267-273 
distribution of sample medians, 
241-242 
distribution of sample 
VN (X — 3)/2, 
derivation, 264-266 
formula, 241 


(N = 2), derivation, 236- 
241, 252-253 

use in making inferences 
about population mean, 
273-284 

distribution of sample ranges, 

242 

use in analysis of variance, 
295 


use in making inferences 
about population standard 
deviation, 294—295 
use in quality control, 295- 
296 
distribution of sample standard 
deviations, derivation, 264 
formula, 236 
(N = 2), derivation, 236 
distribution of sample variances, 
derivation, 263-264 
formula, 234 
(N = 2), derivation, 
233, 251-252 
relation to y2 distribution, 
234-236, 264 
use in making inferences 
about population variance, 
284-294 


232- 
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Random sampling, from a normal 


population, distributions of 
g statistics, 243 
use in testing departure from 
normality, 296-297 
distributions of sample betas, 
242, 243-245 
use in testing departure from 
normality, 296-297 
joint confidence zone for X 
and 8, 366-370 
joint distribution of means 
and standard deviations, 
derivation, 340-344 
formula, 344 
numerical illustrations, 343, 
346 
joint maximum-likelihood esti- 
mates of X and в, 370-371 
maximum-likelihood estimate of 
9?, 290-294 
maximum -likelihood estimate of 
X, в known, 273 
ê unknown, 283-284 
testing departures from nor- 
mality, by use of £i, 8», and 
a, 296-297 
testing difference between 
means of two correlated sam- 
ples, 403-406 
testing difference between 
means of two independent 
samples, a common known 
6, 398-400 
а common unknown ê, 400- 
402 
unequal é's, 403 
testing the difference between 
variances of two inde- 
pendent samples, both di- 
rections considered, 411- 
413 
only one direction considered, 
406-411 
testing hypotheses about both 
X and 6, 344-366 
equations for \ contours, 353 


Random sampling, from a normal 


population, testing hy- 
potheses about both X and 
4, lambda (X) probability 
tables, 361, 362 
regions of rejection, 344—361 
using corner region, 364-366 
using ^ contours, 345-354 
testing hypotheses about 8°, 
284—287, 289-290 
testing hypotheses about X, 6 
known, 267-269 
ê known compared with ê 
unknown, 277-279 
$ unknown, 273-276, 284 
testing whether two samples 
are from same population, 
413-416 
from a normal bivariate popula- 
tion, confidence limits for, 
correlation coefficient (r), 
301-302 
regression parameters, two 
variables, 377 
more than two variables, 
380 
confidence limits for X;, 387- 
390 
distribution of regression sta- 
tistics, 375-376 
distribution of sample correla- 
tion coefficients, 298-300 
distribution of sample lines of 
regression, 380-381 
distribution of sample z's, 299— 
300 
limiting loci for population 
line of regression, 382-384 
maximum-likelihood estimate of 
correlation coefficient (r), 
302-303 
regression statistics, 372-374 
testing hypotheses about, cor- 
relation coefficient (r), 300— 
301 
population regression para- 
meters, 376-377 
X1, 381-382 
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Random sampling, from a normal 


bivariate population, test- 
ing difference between cor- 
relation coefficients of two 
independent samples, 417 
regression statisties of two 
independent samples, 417— 
419 
testing for linearity, 306-307 
testing significance of 7, 306 
testing significance of I, 306 
from a normal multivariate popu- 
lation, confidence limits 
for, higher order variances, 
386-387 
multiple correlation coeffi- 
cient (В... ), 305 
population regression para- 
meters, 380 
distribution of higher order 
varjances, 385-386 
distributions of regression sta- 
tisties, 377-378 
distribution of Xi, 885 
inferences about partial cor- 
relation coefficients, 303-304 
limiting loci for the population 
plane of regression, 385 
maximum-likelihood estimates 
of, higher order variances, 
387 
multiple correlation coeffi- 
cient, Rin, ,, ; 305-306 
regression Statistics, 374-375 
testing hypotheses about, higher 
order variances, 386 
multiple correlation coeffi- 
cient Ви... ‚ 304-305 
а regression parameter, 378— 
380 
Xj, 385 
meaning, 154, 155-156 
and probability, 154 
of public opinion, 330-331 
representative, 183-184 
statistical inferences from (see 
Statistical inference) 
stratified, 183-184 
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Random sampling, technique of, 
mechanical randomizing de- 
vices, 157-158 
natural selection, 161-162 
ordinal selection, 156-157 
random sampling numbers, 159— 
161 
tables of numbers, 158-159 
Random Sampling Numbers, 159 
Range, absolute, definition, 13 
relative, definition, 13 
sampling distribution, 242, 482 
probabilities, table of, 482 
use in quality control, 295-296 
Rank correlation, 452-453 
Reciprocals of numbers, table, 467— 
468 
Region of acceptance, 164 
Region of rejection (see Statistical 
inference, testing hypotheses, 
= region of rejection) 
Rietz, Henry L., 92 
Robinson, G., 93 


5 


Sampling, purposive, 184-185 
random (see Random sampling) 
reasons for, 153-154 

Sampling distribution, of a= 

A.D./o, 244-245 è 
table of .01, .05, and .10 
points, 481 
of ИВ, 242-944 
table of .05 and .10 points, 480 
of B», 242-945 
table of .01 and .05 points, 480 
of the correlation coefficient. (7), 
298-300 
of the difference between two 
means, 399, 401 
percentages, 393-394 - 
28, 417 
explanation of, 164 
of g Statistics, 242-943 
of a higher order variance, 3857 
386 
of k Statistics, 242-943 
of L, 412 
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Sampling distribution, of Хи, 415 
probabilities, table of, 415 
of a linear function, 108-109 
of the mean, any population, 
107-108 
normal population, 231, 262-263 
special nonnormal populations, 
445-446 
of the median, 241-242 
of the multiple correlation coeffi- 
cient, 304-305 
of VN (X — X)/Z, nonnormal 
populations, 449 
normal population, 241, 264— 
266 
special nonnormal populations, 
447—448 
one general distribution, 114. 
of partial correlation coefficient, 
303 
Of a percentage, 190-194, 210, 
211-212 
of the ratio of two maximum- 
likelihood estimates of vari- 
ance, 409 
of a regression coefficient, 108-109 
of the standard deviation, normal 
population, 236, 264 
Special nonnormal populations, 
446-447 
Standard error of, 164 
of г = tanh-! r, 299-300 
Use of various distributions in 
making inferenees (see Anal- 
ysis of variance; Random 
sampling; Statistical infer- 
ence) 
of the variance, nonnormal popu- 
lations, 450 


normal population, 234-236, 
263-264 

Special nonnormal populations, 
446-447 


Semi-invariants (see Cumulants) 

Sheppard, W. F., 132 

Sheppard’s corrections (see Fre- 
quency curves, fitting of, Shep- 
pard’s corrections) 


Skewness, 11 
Smith, B. Babington, 159, 160 
Squares of numbers, 100-1000 table, 
461—462 
Square roots of numbers, 100-1000 
table, 689-690 
Standard deviation, definition, 12 
first order, 18-19 
higher order, 18 
sampling distribution, 236, 264, 
446-447 
Standard error, of 4/3, 242, 244 
of Bs, 242, 244 
definition, 164 
of the difference between two 
sample means, 399 
percentages, 394 
regression statisties, 418 
zs, 417 
of gı, 243 
of gs, 243 
of the mean, 231, 268 
of the median, 173, 242 
of a percentage, 190, 195, 210-211, 
215, 315, 319, 
of the range, 242, 482 
of regression statistics, more than 
two variables, 378 
two variables, 375 
of a sum or difference, correlated 
variables, formula, 406 
independent variables, formula, 
392 
proof of formulas, 420—421 
of the variance, 289 
of Xj, 381, 382, 385 
or z = tanh™ r, 301 
Statistie, 5, 181 
Statistical inference, confidence 
intervals, confidence coefti- 
cient, 175, 176 
confidence limits, 176 
definition, 174-175 
determination of, 175-176 
arbitrary elements in, 177- 
178 
effect of size of sample, 179 
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Statistical inference, estimation of 
population parameters, 174-185 
maximum-likelihood estimates of 
population parameters, con- 
sistency of, 182 
efficiency of, 182 
method, 179-182 
as "optimum" statistics, 182- 
183 
as "unbiased estimates," 424 
sufficiency of, 183 
symbolic representation, 10 
sampling distributions, explana- 
tion, 164 
standard error, 164 
testing hypotheses, arbitrary ele- 
ments in, 165-168 
caution in, 339 
coefficient of risk, 164 
effect of size of sample, 171-172 
efiect of statistic selected, 
172-174 
error I, definition, 163-164 
limiting risk of, 164-168 
error II, definition, 164 
minimizing risk of, 168-171 
power of a test, 198, 199 
the problem, 163 
region of acceptance, 164 
region of rejection, definition, 
164 
selection of, 166-171 
unbiased region, 201 
theory of, 163-185 
(See also Random sampling) 
Stirling’s approximation for fac- 
torials, 70-71 
Symbols, table of, used in this book, 
xi 
Symmetrical binomial distribution, 
betas, 34 
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Symmetrical binomial distribution, 
characteristics, 34 
conditions leading to, 37-38 
derivation of, 32-33 
formula, 33 
mean, 34 
numerical example, 33 
relative slope of, 74-75 
significance of, 35-36 
Standard deviation, 34 


"T 


Tables for Statisticians and Bio- 
metricians, 148, 149 

Tehébychef's inequality, 453-455 

extension of, 454—455 

t distribution (see Frequency curves, 
1 distribution) 

Test of goodness of fit (sec Fre- 
quency curves, testing good- 
ness of fit) 

Testing hypotheses (see Statistical 
inference, testing hypotheses) 

Test of independence (see Inde- 
pendence, test of) 

Tippett, L. H. C., 159 

Tracts for Computers, 117, 159 


U 


United States census (1940), 162 
Universe (see Population) 
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Whittaker, E. T., 93 
Williams, P., 244 
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2 transformation, 299-300 
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